R0-R163  S42 
UNCLRSSIFIED 


THE  CALCULUS  OF  UNCERTRINTV  IN  MlTIFICim.  INTELLI6ENCE  1/3 
AND  EXPERT  SVSTENS. .  <U)  GEORGE  HASHINGTON  UHIV 
UASHINGTON  OC  INST  FOR  RELIAOILITV  AND. . 

N  D  SINGPURHALLA  ET  AL.  15  JAN  OS  F/G  9/2  NL 


THE  CALCULUS  OF  UNCERTAINTY  IN 
ARTIFICIAL  INTELLIGENCE  AND 
EXPERT  SYSTEMS* 


THE 

GEORGE 

WASHINGTON 

UNIVERSITY 


Proceedings  of  a  Conference 
held  on  December  lOft/, 


STUDENTS  FACULTY  STUDY  R 
ESEARCH  DEVELOPMENT  PUT 
URE  CAREER  CREATIVITY  CC 
MMUNITY  LEADERSHIP  TECh 
NOLOGY  FRONTI& 
ENGINEERING  APP/j^« 
GEORGE  WASHI^M^iL  Nl\ 


DTIC 

ELECTE] 


DlST:iIBLrnON  STAIXMENT  A 


Approved  joi  public  telraaol 
Dirtrihution  Uiilimitod 


S(  I  K  l(  il  (  i|  1  N<,IM  i  KI'.C, 
AMI  II  \(  I 


THE  CALCULUS  OF  UNCERTAINTY  IN 
ARTIFICIAL  INTELLIGENCE  AND 
EXPERT  SYSTEMS* 


Proceedings  of  a  Conference 
held  on  December  26-29,  1984 


GWU/IRRA/Serlal  TR-86/2 


The  George  Washington  University 
School  of  Engineering  and  Applied  Science 
Institute  for  Reliability  and  Risk  Analysis 


s 


DTIC 

lELECTE 

.  FEB  04)986 


*Thls  work  was  suppoi-ted  by  the  National  Security  Agency  under 
Grant  N00014-85-G-0162  Issued  by  the  Office  of  Naval  Research, 
and  by  Contract  N00014-85-K-0202  with  The  George  Washington  University. 
The  United  States  Government  has  a  royalty-free  license  throughout 
the  world  In  all  copyrightable  material  contained  herein. 


ntSTBlBUnOW  STATL]»t:«T_r 
Approved  loi  pubUc  lelwBej 
Diatribution  llnUa^^*^ - 


CONTENTS 


Foreword 

Key  Participants 
List  of  Attendees 


Probability  Judgment  in  Artificial  Intelligence 
and  Expert  Systems,  Glenn  Shafer 

Transcript  of  Oral  Presentation 
Comments  of  Discussants 
Transcript  of  Floor  Discussion 

Fuzzy  Sets  and  Possibility  Theory 
Citation  List,  Lotfi  A.  Zadeh 

Transcript  of  Oral  Presentation 
Comments  of  Discussants 
Transcript  of  Floor  Discussion 

The  Probability  Approach  to  the  Treatment 

of  Uncertainty  in  Artificial  Intelligence 
and  Expert  Systems,  Dennis  V.  Lindley 


Transcript  of  Oral  Presentation 
Comments  of  Discussants 
Transcript  of  Floor  Discussion 

Probabilistic  Expert  Systems  in  Medicine: 

Practical  Issues  in  Handling  Uncertainty, 
David  J.  Spiegelhalter 


Transcript  of  Oral  Presentation 
Comments  of  Discussants 
Transcript  of  Floor  Discussion 


General  Discussion  I 
General  Discussion  II 
Concluding  Discussion 


Retrospective  Comments,  Stephen  R.  Watson 


Accesion  For 


NTIS  CRA&I 
DTIC  TAB 
Unannounced 
Justification 


By . 

Diit  ibution  / 


□ 

□ 


Avdiljb,li(y  Cories 


Dist  j  J  J/or 

<  Special 


W-/ 


Foreword 


Despite  the  remarkable  progress  In  the  use  and  application  of 
artificial  intelligence  and  expert  systems  techniques  in  the  past  ten 
years,  several  fundamental  issues  remain  unresolved. 

One  of  these  is  how  best  to  deal  with  uncertainty  in  the 

conditions  of  interest  involving  -the  use  of  expert  systems.  Even  with 
the  increased  pace  of  discovery  and  innovation  in  the  mathematical  and 
Information  sciences,  there  still  remain  to  be  resolved  issues 
pertaining  to  methods  adequate  for  the  treatment  of  uncertainty  which 
are  acceptable  to  all  practitioners.  Obviously  many  philosophical  and 
methodological  questions  need  to  be  addressed. 

It  was  clearly  recognized  by  researchers  in  government  and 
universities  that  a  conference  to  address  these  issues  and  to  at  least 
focus  some  of  the  thoughts  of  scientists,  develop  awareness  and  concern, 
and  share  experiences  would  be  a  worthvdiile  happening.  In  particular, 
the  scientific  and  technical  interests  of  the  Office  of  Kaval  Research 
and  the  National  Security  Agency  were  important  factors  in  motivating 
the  organization  of  such  a  meeting.  Edward  J.  Wegman  of  ONR  and  C. 
Terrance  Ireland  and  James  Maar  of  NSA  were  responsible  for  stimulating 
the  concept  and  fostering  its  realization.  Professor  Nozer  D. 
Slngpurwalla,  Professor  of  Operations  Research  and  of  Statistics  at  the 
George  Washington  University,  was  the  key  to  accomplishing  the 
transformation  from  idea  to  reality,  and  was  the  organizer  and  driving 
force  to  Implement  this  common  aspiration. 

Accordingly,  plans  were  made  to  convene  a  conference,  the  first 
of  its  kind,  at  the  George  Washington  University.  The  GWU  Institute 
for  Reliability  and  Risk  Analysis  would  host  the  event  and  also  would  be 
able  to  provide  several  key  participants  who  are  experts  in  the  subject 
areas.  Good  fortune  had  arranged  for  Professor  Dennis  V.  Lindley  and 
Stephen  R.  Watson  of  Cambridge  University  to  be  visiting  at  the 
Institute,  and  the  Department  of  Operations  Research,  respectively,  at 
the  appropriate  time.  The  agencies  provided  sponsorship,  and  plans 
moved  forward.  The  conference  would  be  limited  to  a  small  number  of 
researchers  and  Interested  practitioners. 

The  meeting  was  deliberately  restrained  to  be  a  low  key  event  and 
took  place  on  December  28  and  29,  I98i»,  at  a  time  when  the  University 
was  closed  for  the  winter  holiday  period.  Nevertheless,  the  University 
Interest  was  not  to  be  diminished.  President  Lloyd  H.  Elliott  played  a 
personal  role  in  getting  the  conference  off  to  a  good  start  by  providing 
a  thoughtful  and  witty  talk  which  not  only  launched  the  meeting  in  an 
intellectually  stimulating  manner  but  also  clearly  demonstrated  the 
University's  support  for  the  conference  and  its  subject. 

This  report  is  an  attempt  to  document  the  proceedings  of  the 
conference  in  a  manner  that  will  provide  a  record  of  what  transpired  for 
the  sponsors  and  participants,  and  also  provide  resource  material  for 
those  interested  in  the  topic  from  any  of  several  perspectives. 


It  is  hoped  and  expected  that  the  original  material  prepared  for 
the  conference  will  eventually  be  published  in  the  open  scientific 
literature.  Now,  however,  this  is  a  record  of  the  actual  presentations 
and  discussions  (or  as  close  as  we  could  get  to  it)  by  the  participants 
as  well  as  the  original  technical  papers.  The  discussions  were  edited 
and  smoothed  only  very  lightly,  as  will  be  apparent  from  a  cursory 
reading.  As  was  to  be  expected,  not  all  of  the  participants  provided 
uniform  inputs  to  facilitate  the  documentation  task. 

A  genuine  shortcoming  of  these  proceedings  is  the  inability  to 
completely  capture  and  reproduce  the  effectiveness  of  the  speaker-' s  usr 
of  the  overhead  projector.  We  have  Included  some  of  those  slides  as 
selected  by  the  speakers:  however  much  of  the  communication  was  due  to 
the  vigor  of  the  presentations  and  audience  interactions,  frequently 
making  use  of  spontaneous  but  valuable  on  the  spot  transparencies,  some 
of  which  unfortunately  cannot  be  reproduced  here. 

A  most  Important  factor  in  ensuring  the  success  of  the  meeting 
was  that  of  the  role  played  by  the  Moderator,  Professor  Morris  DeGroot 
of  the  Department  of  Statistics,  Carnegle-Mellon  University.  His 
conduct  of  the  meeting  during  its  two  days,  and  performance  as  a 
catalytic  agent,  interpreter,  and  clairvoyant  was  most  important  to  the 
outcome.  It  is  truly  felt  that  the  transcript  alone  cannot  adequately 
reflect  his  essential  contributions  in  this  regard.  To  those  who  were 
present,  it  was  easily  recognized  as  the  privilege  of  witnessing  a  most 
enjoyable  tour  de  force  by  a  scientist  who  is  uniquely  talented  and 
expertly  knowledgeable,  and  generous  with  his  ready  wit  and  humor. 

Grateful  acknowledgment  is  made  of  the  assistance  of  Professors 
Donald  Gross  and  Graham  W.  McIntyre  of  the  GWU  School  of  Engineering  and 
Applied  Science  in  solving  critical  administrative  problems  incident  to 
the  meeting,  and  to  Mrs.  Teresita  R.  Abacan  in  typing  this  report. 


Seymour  M.  Sellg 
Coordinating  Editor 


KEY  PARTICIPANTS 


Invited  Speakers  (in  order  of  presentation); 


Glenn  Shafer,  School  of  Business 

University  of  Kansas 
-  Laurence,  KS 


Lotfi  A.  Zadeh,  Computer  Science  Division 

University  of  California,  Berkeley 
Berkeley,  CA 


Dennis  V.  Lindley,  The  George  Washington  University 
(Visiting),  Washington,  DC 


David  Speigelhalter ,  Medical  Research  Council  Centre 
Cambridge,  England 


Invited  Discussants: 


Arthur  P.  Dempster,  Department  of  Statistics 
Harvard  University 
Cambridge,  MA 


Stephen  R.  Watson,  The  George  Washington  University 
(Visiting),  Washington,  DC 


Moderator:  Morris  H.  DeGroot,  Department  of  Statistics 

Carnegie-Mellon  University,  Pittsburgh,  PA 


ATTENDANCE  LIST 


CONFERENCE  ON  THE  CALCULUS  OF  UNCERTAINTY  IN 
ARTIFICIAL  INTELLIGENCE  AND  EXPERT  SYSTEMS 

December  27-28,  198M 


PATRICK  BAILEY 

Naval  Oceans  System  Center 

271  Catalina  Blvd. 

San  Diego,  CA  92152 


LEE  S.  BROWNSTON 
Department  Computer  Science 
Carnegie-Mellon  University 
Pittsburgh,  PA  15213 


MARVIN  S.  COHEN 
Decision  Science  Consortium, 
Inc. 

7700  Leesburg  Pike,  Suite  1121 
Falls  Church,  VA  22043 


S.  G.  CORTELYOU 
SEAS,  CEEP 

George  Washington  University 
Washington,  DC  20052 


MORRIS  H.  DEGROOT 
Department  o^  Statistics 
Carnegie-Mellon  University 
Pittsburgh,  PA  15213 


ARTHUR  P.  DEMPSTER 
Department  o^  Statistics 
Harvard  University 
Cambridge,  MA  02138 


LLOYD  H.  ELLIOTT 
President 

George  Washington  University 
Washington,  DC  20052 


PAUL  S.  FISCHBECK 
Operations  Research 
Department 

Naval  Postgraduate  School 
Monterey,  CA  93943 


DONALD  CROSS 
Department  of  Operations 
Research 

George  Washington  University 
Washington,  DC  20052 


HENRY  HAMBURGER 

Naval  Research  Laboratory 

Washington,  DC  20375 


HERBERT  H.  HOLMAN 

Department  of  Defense 

Ft.  George  G.  Meade  MD  20755 


C.  TERRANCE  IRELAND 
National  Security  Agency 
Ft.  George  G.  Meade  MD  20755 


JOHN  KAY 

Department  Engineering 
Administration 

George  Washington  University 


ARTHUR  D.  KIRSCH 
Dept,  o^  Statistics/ 
Computers  Inf'o.  Systems 
George  Washington  University 
Washington,  DC  20052 


AUGUSTINE  KONG 
Harvard  University 
Statistics  Department 
One  Oxford  Street,  6th  Floor 
Cambridge,  MA  02136 


JAY  LIEBOWITZ 
Department  of  Management 
Science 

George  Washington  University 
Washington,  DC  20052 


DENNIS  V.  LINDLEY 
Department  of  Ope’^ations 
Research 

George  Washington  University 
Washington,  DC  20052 


JAMES  RICHARD  MAAR 
National  Security  Agency 
Ft.  George  G.  Meade  MD  20755 


GABRIEL  PEI 

IBM  Federal  Systems  Division 
9500  Godwin  Drive 
Manassas,  VA  22110 


RICHARD  Y.  PEI 
The  Rand  Corporation 
2100  M  St. ,  N.W. 
Washington,  DC  20037 


JOHN  D.  PRANCE 

Department  of  Defense 

Ft.  George  G.  Meade  MD  20755 


PHILIP  N.  REEVES 
Department  of  Health 
Services  Adm. 

George  Washington  University 
Washington,  DC  20052 


SEYMOUR  M.  SELIG 
Institute  for  Reliability 
and  Risk  Analysis 
George  Washington  University 
Washington,  DC  20052 


GLENN  SHAFER 
School  of  Business 
University  of  Kansas 
Lawrence,  Kansas  660*15 


ALAN  L.  MEYROWITZ  J.  RANDOLPH  SIMPSON 

Office  of  Naval  Research  Office  of  Naval  Research 

Information  Sciencies  Division  Arlington,  VA  22217 
800  N.  Ouincy  Street 
Arlington,  VA  2221? 


NOZER  D.  SINCPURWALLA 
Department  of  Operations 
Research 

George  Washington  University 
Washington,  DC  20052 


ROBERT  SMYTHE 
Department  of  Statistics 
George  Washington  University 
Washington,  DC  20052 


HENRY  SOLOMON 
Graduate  School  or  Arts  & 
Sciences 

George  Washington  University 
Washington,  DC  20052 


RICHARD  M.  SOLAND 
Department  of  Operations 
Research 

George  Washington  University 
Washington,  DC  20052 


REFIK  SOYER 

Department  of  Operations 
Research 

George  Washington  University 
Washington,  DC  20052 


STEPHEN  R.  WATSON 
Cambridge  University 
Engineering  Dept. 

Control  and  Management 
Systems  Div. 

Mill  Lane 

Cambridge,  CB2  IRX  England 


EDWARD  J.  WEGMAN 
Office  of  Naval  Research 
800  N  Quincy  Street 
Arlington,  VA  22217 


BEN  P.  WISE 

Carnegi e-Mel Ion  University 
Robotics  Institute  and  Dept, 
of  Engineering  and  Public 
Policy 

Schenley  Park 
Pittsburgh,  PA  15213 


RONALD  R.  YAGER 
Machine  Intelligence 
Institute 
Iona  College 
New  Rochelle,  NY  10801 


LOTFI  A.  ZADEH 
Computer  Science  Division 
University  of  California 
Berkeley,  CA  9*1720 


<-<■  f 
V>  ! 


.V 


DAVID  SPEIGELHALTER 
MRC  Biostatistics  Unit 
Medical  Research  Council  Centre 
Hills  Road 

Cambridge,  CB2  2QH  England 


% 


-V  A 


PROBABILITY  JUDGMENT  IN  ARTIFICIAL  INTELLIGENCE 


AND  EXPERT  SYSTEMS 


Glenn  Shafer 
School  of  Business 
University  of  Kansas 


CONTENTS 

1.  The  Emergence  of  Probability  in  Artificial  Intelligence  .  9 

2.  Bayesian  and  Belief-Function  Arguments . 13 

2.1,  Two  Strategies  for  Probability  Judgment . 13 

2.2,  The  Frequentist  vs.  Bayesian  Deadlock . 16 

2.3,  Constructive  Probability . 18 

2.4,  The  Language  of  Belief  Functions . 20 

2.5,  Conclusion . 27 

3.  The  Attempt  to  Use  Probability  in  Production  Systems  ,  ,  , 28 

3.1,  Bayesian  Networks . 30 

3.2,  Certainty  Factors  and  Belief  Functions . 35 

3.3,  Conclusion . 39 

4.  The  Construction  of  Arguments . 41 

References  .  43 


This  paper  was  prepared  for  the  Conference  on  the  Calculus  of 
Uncertainty  in  Artificial  Intelligence  and  Expert  Systems  held  at 
George  Washington  University,  December  27  and  28,  1984.  Research 
for  the  paper  was  partially  supported  by  NSF  grant  IST-8405210. 


I  have  been  asked  to  speak  on  the  use  of  belief  functions  in 
artificial  intelligence  and  expert  systems.  For  the  sake  of 
perspective,  I  propose  to  address  the  broader  topic  indicated  by 
my  title.  The  theory  of  belief  functions  is  part  of  the  theory 
of  probability  judgment,  and  a  general  understanding  of  the  role 
of  probability  judgment  in  artificial  intelligence  can  help  us 
understand  the  particular  role  of  belief  functions. 

I  will  not  attempt  to  evaluate  all  the  ways  in  which  proba¬ 
bility  has  been  used  in  artificial  intelligence,  nor  even  all  the 
ways  in  which  belief  functions  have  been  used.  Instead,  I  will 
aim  for  some  general  insights  into  the  interaction  between  proba¬ 
bility  ideas  and  artificial  intelligence  ideas.  Many  of  my 
comments  will  be  historical.  I  hope  readers  will  forgive  me  for 
those  cases  where  I  belabor  the  obvious  or  repeat  the  well-known; 
my  excuse  is  that  I  hope  to  reach  a  dual  audience — students  of 
probability  who  may  not  know  very  much  about  artificial  intelli¬ 
gence,  and  students  of  artificial  intelligence  who  may  not  know 
very  much  about  probability. 

The  first  two  sections  of  the  paper  are  introductory  in 
nature.  Section  1  considers  the  reasons  for  the  artificial 
intelligence  community's  initial  disinterest  in  probability  and 
its  recent  change  of  heart  and  outlines  the  paper's  conclusions 
about  the  how  current  expert  systems  fall  short  of  putting  proba¬ 
bility  judgment  into  artificial  intelligence.  Section  2  deals 
with  probability  judgment  without  reference  to  artificial  intel¬ 
ligence;  here  I  discuss  the  split  between  Bayesian  and  non- 

-  8  - 


Bayesian  methods  and  place  the  theory  of  belief  functions  in  this 
historical  context. 

Sections  3  studies  some  strands  of  the  development  within 
artificial  intelligence  of  ideas  about  using  probability  judgment 
in  expert  systems.  Here  ve  see  how  the  general  issues  that 
separate  the  Bayesian  and  belief-function  theories  appear  in  the 
context  of  expert  systems,  and  we  gain  some  insight  into  why 
flexibility  is  harder  to  achieve  with  probability  judgment  than 
with  other  kinds  of  reasoning.  Section  4  discusses  the  problem 
of  giving  an  artificial  intelligence  a  genuine  capacity  for 
probability  judgment. 

1.  The  Emergence  of  Probability  in  Artificial  Intelligence 

Until  recently,  the  artificial  intelligence  community  showed 
relativly  little  interest  in  probability.  There  is  little  proba- 
bilility,  for  example,  in  the  three  volume  Handbook  of  Artificial 
Intelligence ,  published  in  1981  and  1982.  During  the  past  two  or 
three  years,  however,  probability  and  the  management  of  uncer¬ 
tainty  in  intelligent  systems  has  become  a  widely  discussed 
topic.  Why  the  initial  disinterest,  and  why  the  change? 

The  reasons  for  the  initial  disinterest  are  clear.  Probabi¬ 
lities  are  numbers,  and  number  crunching  is  just  what  artificial 
intelligence  was  supposed  not  to  be.  When  the  artificial  intel¬ 
ligence  community  was  founded,  computers  were  used  mainly  for 
number  crunching.  They  were  impressively  good  at  this,  but  they 
were  not  intelligent.  Intelligence  seemed  to  require  more  gen¬ 
eral  kinds  of  symbol  manipulation. 

Moreover,  when  we  begin  to  think  about  computer  programs 


that  will  match  the  achievements  of  human  intelligence,  ve  find 
that  ve  are  thinking  about  programs  with  non-numerical  inputs  and 
outputs.  What  place  is  there  for  talk  about  numbers  in  the  case 
of  these  programs?  They  are  merely  sets  of  rules  for  going  from 
the  inputs  to  the  outputs,  and  while  it  might  be  possible  to 
identify  some  intermediate  steps  that  are  analogous  to  operations 
on  nun^erical  probabilities,  it  seems  pointless  to  do  so.  It 
seems  better  to  tell  what  is  really  going  on. 

The  prejudice  against  numbers  in  general  and  probabilities 
in  particular  has  not  entirely  disappeared  from  artificial  intel¬ 
ligence,  and  the  argument  sketched  in  the  preceding  paragraph  is 
still  made.  Paul  Cohen  and  Jon  Doyle  made  it  during  the  panel 
discussion  on  uncertainty  at  the  meeting  of  the  American  Associa¬ 
tion  for  Artificial  Intelligence  in  Austin  last  summer.  Cohen 
went  on  to  argue  that  probability  talk  should  be  replaced  by  talk 
about  reasons  and  endorsements — ve  should  spell  out  what  endorse¬ 
ments  a  program  requires  before  it  will  take  a  given  action  or 
draw  a  given  conclusion  (Cohen,  1983).  Doyle  argued  that  the 
problem  of  combining  uncertain  evidence  should  be  solved  not  by 
numerical  calculations  but  by  the  techniques  of  non-monotonic 
logic  (Doyle,  1979). 

But  the  factors  that  caused  this  prejudice  have  substan¬ 
tially  changed.  The  vague  idea  that  artificial  intelligence  can 
be  defined  largely  through  the  contrast  with  number  crunching  has 
been  replaced  by  the  equally  vague  but  equally  powerful  idea  that 
intelligence  is  produced  by  complexity  and  by  access  to  large 
amounts  of  knowledge.  And  two  specific  openings  have  appeared 


■y. 

V 


(1)  The  absolute  ban  on  non-numerical  inputs  has  been 
dropped.  In  addition  to  programs  that  try  to  natch  aspects  of 
human  intelligence,  artificial  intelligence  is  now  also  con- 

c  ^ned  with  expert  systems  and  other  intelligent  systems  that 
inceract  with  human  users  and  can  use  numerical  inputs  sup¬ 
plied  by  these  users. 

(2)  The  artificial  intelligence  community  has  absorbed 
David  Marr's  views  on  levels  of  explanation.  In  his  work  on 
vision,  Marr  convincingly  made  the  point  that  full  understand¬ 
ing  of  an  intelligent  system  involves  explanation  at  various 
levels.  In  addition  to  explanation  at  the  level  of  implemen¬ 
tation  (what  is  really  going  on)  we  also  need  explanation  at 
more  abstract  levels.  "It's  no  use,  for  example,  trying  to 
understand  the  fast  Fourier  transform  in  terms  of  resistors  as 
it  runs  on  an  IBM  370."  (Marr,  1982,  p.  337)  Understanding 
of  this  point  takes  the  rhetorical  force  out  of  the  argument 
that  there  is  no  place  for  probability  ideas  when  inputs  and 
outputs  are  non-numerical . 

Most  of  the  current  interest  in  probability  in  artificial  intel¬ 
ligence  is  the  result  of  opening  (1).  In  many  areas  it  impossi¬ 
ble  to  build  expert  systems  without  the  use  of  probability.  But 
I  will  argue  in  this  paper  that  opening  (2)  is  a  more  genuine 
opening  for  probability  in  artificial  intelligence.  Because  of 
(2),  we  can  now  recognize  the  value  to  an  artificial  intelligence 
of  an  ability  to  design  probability  arguments  and  generate  the 
numerical  judgments  they  require. 

The  ban  on  numerical  inputs  in  artificial  intelligence  was 


dropped  because  the  artificial  intelligence  community  became 
interested  in  expert  systems.  Why  did  this  happen?  The  answer 
is  that  the  community  discovered  ways  of  building  expert  systems 
that  incorporated  ideas  that  seemed  to  reflect  important  aspects 
of  human  intelligence.  As  I  explain  in  section  3  below,  most  of 
the  expert  systems  developed  within  artificial  intelligence  have 
been  production  systems,  and  production  systems  seem  to  have  the 
flexibility  in  acquiring  and  using  knowledge  that  is  characteris¬ 
tic  of  intelligence. 

I  argue  in  this  paper  that  the  expert  systems  we  can  now 
build  to  use  probability  judgments  do  not  have  this  kind  of 
flexibility  and  hence  should  not  be  classed  under  the  heading  of 
artificial  intelligence.  The  problem  seems  to  be  that  proba¬ 
bility  judgment  requires  an  overall  design  and  hence  cannot  be 
achieved  by  relatively  unstructured  methods  of  programming  ap¬ 
plied  to  unstructured  probability  judgments. 

As  a  result  of  the  explosion  of  interest  in  expert  systems, 
the  field  of  artificial  intelligence  is  now  struggling  to  main¬ 
tain  its  sense  of  identity.  The  idea  of  an  expert  system  began 
in  artificial  intelligence,  but  any  system  with  expert  capabili¬ 
ties  can  justifiably  claim  the  name,  whether  it  is  written  in 
LISP  or  FORTRAN,  and  many  systems  developed  outside  of  artificial 
intelligence  have  more  impressive  expert  capabilities  than  those 
developed  inside  it.  It  is  clear,  therefore,  that  artificial 
intelligence  must  withdraw  from  its  embrace  of  the  whole  field  of 
expert  systems  in  order  to  maintain  intellectual  coherence.  But 


4 


i  * 


rV-v. 


*  %■  V'  ' 


t-S-;- 


it  is  unclear  just  what  parts  of  the  field  of  expert  systems  will 


remain  in  the  embrace.  My  suggestion  here  is  that  artificial 
intelligence  will  retain  its  newfound  interest  in  probability  but 
will  look  beyond  the  current  expert  systems  to  deeper  uses  of 
probability  ideas. 


2.  Bayesian  and  Belief-Function  Arguments 

In  this  section  I  review  some  general  ideas  about  probabili¬ 
ty  judgment,  without  reference  to  the  particular  problems  of 
artificial  intelligence.  I  begin  by  sketching  a  way  of  looking 
at  the  frequent ist  vs.  Bayesian  controversy,  a  controversy  that 
has  dominated  discussions  of  probability  judgment  for  more  than  a 
century.  After  developing  a  constructive  understanding  of  the 
Bayesian  theory,  I  introduce  another  constructive  theory,  the 
theory  of  belief  functions.  I  argue  that  both  theories  should  be 
thought  of  as  languages  for  expressing  probability  judgments  and 
constructing  probability  arguments. 


2.1.  Two  Strategies  for  Probability  Judgment.  What  we  now 
call  the  mathematical  theory  of  probability  was  originally  called 
the  theory  of  games  of  chance.  Probability  was  an  entirely 
different  topic;  something  was  probable  when  there  was  a  good 
argument  or  good  authority  for  it.  When  James  Bernoulli  and 
others  began  to  use  the  word  probability  in  connection  with  the 
theory  of  games  of  chance,  they  were  expressing  the  ambition  that 
this  theory  might  provide  a  general  framework  for  evaluating 
evidence  and  weighing  arguments.  But  just  how  might  this  work? 
How  can  the  theory  of  games  of  chance  help  us  evaluate  evidence? 

In  the  nineteenth  century,  it  became  clear  that  there  are 

-  13  - 


two  distinct  strategies  for  relating  evidence  to  the  picture  of 
chance.  Today,  these  two  strategies  might  be  called  the  frequen 
tist  and  Bayesian  strategies,  but  in  order  to  avoid  some  of  the 
connotations  of  these  names,  let  me  call  them,  for  the  moment, 
the  direct  probability  and  conditional  probability  strategies. 

The  direct  probability  strategy  relies  on  direct  application 
of  the  idea  that  in  life,  as  in  games  of  chance,  what  happens 
most  often  is  jnost  likely  to  happen  in  a  particular  case  under 
consideration.  The  ideal  kind  of  evidence  for  this  strategy  is 
knowledge  of  the  frequency  of  outcomes  in  similar  cases.  I 
assign  a  98%  probability  to  the  prediction  that  a  student  who 
first  appears  three  weeks  after  the  beginning  of  my  elementary 
statistics  course  will  not  be  able  to  pass  the  course,  because  it 
has  almost  always  turned  out  that  way  in  the  past. 

The  conditional  probability  strategy  uses  the  picture  of 
chance  in  a  deeper  way.  It  observes  that  games  of  chance  unfold 
step-by-step,  with  the  probabilities  for  different  possibJje^^nal 
outcomes  changing  at  each  step,  and  it  suggests  that  the  accumu¬ 
lation  of  evidence  should  change  probabilities  in  a  similar  step- 
by-step  way.  Thus  my  probability  for  whether  the  late-appearing 
student  will  pass  my  course  should  change  when  I  learn  more  about 
his  history  and  circumstances,  just  as  my  probability  for  whether 
two  successive  rolls  of  a  die  will  add  to  nine  will  change  when  I 
learn  the  result  of  the  first  roll.  The  conditional  probability 
strategy  usually  leads  to  a  more  complicated  argument  than  the 
direct  probability  strategy,  since  it  involves  construction  of  a 
probability  measure  over  a  more  complicated  frame  and  then  the 
reduction  of  this  measure  and  frame  by  conditioning. 


In  general,  there  is  not,  I  believe,  any  a  priori  reason  to 
prefer  one  of  these  two  stategies  to  the  other.  We  cannot  say 
that  it  is  normative  to  use  one  and  irrational  to  use  the  other. 
They  are  both  stategies  for  producing  arguments,  and  it  is  the 
arguments  that  must  be  evaluated  as  convincing  or  unconvincing. 

It  may  be  most  convincing  to  lump  this  late-appearing  student 
with  all  my  past  late-appearing  students,  with  the  general  excuse 
that  particulars  have  not  made  much  difference  in  the  past.  Or  I 
may  have  had  enough  experience  with  late-appearing  students  like 
this  one  on  some  particulars  that  I  can  make  a  more  convincing 
direct  probability  argument  by  looking  at  the  past  frequency  of 
success  just  for  these  late-appearing  students.  Or,  on  the  other 
hand,  I  may  have  the  experience  and  insight  needed  to  convincing¬ 
ly  make  probability  judgments  from  which  I  can  construct  a  proba¬ 
bility  measure  that  I  can  condition  on  the  particulars.  The 
issue  cannot  be  settled  in  the  abstract,  without  reference  to  the 
experience  I  bring  to  bear  on  the  problem. 

I  also  believe  that  neither  of  the  two  strategies  is  inher¬ 
ently  more  objective  or  subjective  than  the  other.  It  is  true 
that  the  direct  probability  strategy,  since  it  tends  to  consider 
broader  classes,  is  more  likely  to  result  in  probability  judg¬ 
ments  based  on  actual  frequency  counts.  But  the  objectivity  of 
these  frequencies  must  always  be  coupled  with  a  subjective  judg¬ 
ment  of  their  relevance.  And  even  with  broad  classes  we  most 
often  have  hunches  and  impressions  rather  than  actual  counts. 

Historically,  however,  the  direct  probability  strategy 
has  come  to  be  associated  with  claims  to  objectivity,  while  the 


conditional  probability  approach  has  come  to  the  associated  with 
claims  to  rationality.  This  fact  seems  to  be  a  result  of  efforts 
to  square  the  interpretation  of  probability  with  the  empiricist 
and  positivist  philosophical  trends  of  the  late  nineteenth  and 
early  twentieth  centuries. 

2.2.  The  Frequentist  vs.  Bayesian  Deadlock.  Laplace,  writ¬ 
ing  at  the  beginning  of  the  nineteenth  century,  was  able  to 
define  numerical  probability  as  the  measure  of  the  "reason  we 
have  to  believe."  But  by  the  middle  of  the  nineteenth  century, 
many  students  of  probability  were  looking  for  a  more  empirical 
definition.  They  found  this  definition  in  the  idea  of  frequency, 
and  they  proceeded  to  reject  those  applications  of  probability 
theory  that  could  not  be  based  on  observed  frequencies.  In 
particular,  they  rejected  Laplace's  method  of  calculating  the 
probability  of  causes,  which  is  a  special  case  of  the  conditional 
probability  strategy. 

The  frequentist  philosophy  severely  restricted  the  domain  of 
application  of  numerical  probability,  and  those  who  wanted  to  use 
numerical  probability  more  generally  were  forced  to  search  for  a 
philosophical  foundation  for  the  conditional  probability  strategy 
that  would  fit  the  positivist  mind-set.  Such  a  philosophical 
foundation  was  finally  established  in  the  twentieth  century  by 
Ramsey,  de  Finetti,  and  especially  Savage.  These  authors  con¬ 
ceived  the  idea  that  sujective  probability  should  be  given  a 
behavioral  and  hence  positivist  interpretation — a  person's  proba¬ 
bilities  should  be  derivable  from  his  choices.  They  formulated 
postulates  for  what  they  called  rational  behavior,  postulates 


TXT  >n»ir»jn^t->.TT/.^f  V  >  V  'iWjJAjT.iByuTi.T.ieiguw  i  iiiuiwm  i  mwiff 


m 


rr 


\ 

% 

^ 


I 


which  assure  that  a  person's  choices  do  determine  numerical 
probabilities.  And  they  argued  that  it  is  normative  to  follow 
these  postulates  and  hence  normative  to  have  subjective  proba¬ 
bilities. 

During  the  past  two  decades,  the  philosophical  foundation 
provided  by  Savage's  postulates  has  led  to  a  remarkable  resur¬ 
gence,  both  mathematical  and  practical,  of  the  conditional  proba¬ 
bility  strategy.  The  resulting  body  of  theory  has  been  called 
"Bayesian,"  because  the  conditional  probability  strategy  often 
uses  Bayes's  theorem. 

Though  the  new  Bayesian  philosophy  has  played  a  historically 
valuable  role  in  rescuing  the  conditional  probability  strategy 
from  its  frequent ist  opponents,  it  has  its  own  obvious  short¬ 
comings.  Most  important,  perhaps,  is  its  inability  to  explain 
how  the  quality  of  a  probability  analysis  depends  on  the  availa¬ 
bility  and  quality  of  relevant  evidence.  Whereas  the  frequentist 
philosophy  tries  to  limit  applications  of  probability  to  models 
for  which  we  have  clearly  relevant  and  objective  frequency 
counts,  there  is  nothing  in  the  Bayesian  philosophy  to  make  our 
choice  of  a  model  depend  in  any  way  on  the  availability  of  rele¬ 
vant  evidence.  The  postulates  apply  equally  to  any  model. 

We  have,  then,  a  deadlock  between  two  inadequate  philos¬ 
ophies  of  probability.  On  the  one  side,  the  frequentist  philoso¬ 
phy,  which  recognizes  the  relevance  of  evidence  but  tries  to 
justify  claims  to  objectivity  by  limiting  numerical  probability 
judgment  to  cases  where  the  evidence  is  of  an  ideal  form;  on  the 
other  side,  the  Bayesian  philosophy,  which  recognizes  the  subjec¬ 
tivity  of  all  probability  judgment  but  ignores  the  quality  of 


evidence  and  claims  it  is  normative  to  force  all  probability 
judgment  into  one  particular  mold. 

We  have  been  caught  in  this  deadlock  for  three  decades.  We 
have  tired  of  it,  and  ve  are  inclined  to  ask  the  two  sides  to 
compromise  (see,  e.g..  Box,  1980).  But  ve  have  not  been  able  to 
find  a  philosophical  foundation  for  probability  judgment  that  can 
resolve  the  deadlock. 

I  believe  that  the  way  out  of  the  deadlock  is  to  back  up  and 
recognize  that  a  positivist  philosophical  account  of  probability 
is  no  longer  needed.  Our  intellectual  culture  has  moved  away 
from  positivism  and  towards  various  sorts  of  pragmatism,  and  once 
ve  recognize  this  we  will  be  free  to  discard  both  the  frequen- 
tists'  claims  to  objectivity  and  the  Bayesians*  claims  to  norma- 
t iveness . 

2.3.  Constructive  Probability.  In  several  recent  papers 
(especially  Shafer,  1981,  and  Shafer  and  Tversky,  1985)  I  have 
proposed  the  name  "constructive  probability"  for  the  pragmatic, 
post-positivist  foundation  that  I  think  we  need  for  probability 
judgment.  The  idea  is  that  numerical  probability  judgment  in¬ 
volves  fitting  an  actual  problem  to  a  scale  of  canonical  exam¬ 
ples.  The  canonical  examples  usually  involve  the  picture  of 
chance  in  some  way,  but  different  choices  of  canonical  examples 
are  possible,  and  these  different  choices  provide  different  theo¬ 
ries  of  subjective  probability,  or,  if  you  will,  different  lan¬ 
guages  in  which  to  express  probability  judgments.  No  matter  what 
language  is  used,  the  judgments  expressed  are  subjective;  the 
subjectivity  enters  when  we  judge  that  the  evidence  in  our  actual 


problem  matches  in  strength  and  significance  the  evidence  in  the 
canonical  example. 

Within  a  given  language  of  probability  judgment,  there  can 
be  different  strategies  for  fitting  the  actual  problem  to  the 
scale  of  canonical  examples.  The  direct  and  conditional  proba¬ 
bility  strategies  described  above  live,  I  think,  in  the  same 
probability  language,  the  language  in  which  evidence  about  actual 
questions  is  fit  to  canonical  examples  where  answers  are  deter¬ 
mined  by  known  chances.  We  may  call  this  language  the  Bayesian 
language.  (For  a  more  detailed  account  of  different  strategies 
that  are  available  within  the  Bayesian  language,  see  Shafer  and 
Tversky,  1985.  The  distinction  between  the  direct  and  condi¬ 
tional  probability  strategies  corresponds  to  the  distinction  that 
is  made  there  between  total-evidence  and  conditioning  designs.) 

The  constructive  viewpoint  tells  us  that  when  we  work  within 
the  Bayesian  language  we  must  make  a  judgment  about  how  far  to 
take  the  conditional  probability  strategy  in  each  particular 
problem.  And  we  make  this  judgment  on  the  basis  of  the  availa¬ 
bility  of  evidence  to  support  the  conditional  and  unconditional 
probability  judgments  that  are  required. 

It  may  be  useful  to  elaborate  this  point.  Suppose  we  want 
to  make  probability  judgments  about  a  frame  of  discernment  S.  (A 
frame  of  discernment  is  a  list  of  possible  answers  to  a  question; 
so  this  means  we  want  to  make  probability  judgments  about  which 
answer  is  correct.)  We  reflect  on  what  relevant  evidence  we 
have,  and  produce  a  list  E]^,...,En  of  facts  that  seem  to  sum¬ 
marize  this  evidence  adequately.  The  conditional  probability 


Strategy  amounts  to  standing  back  from  our  knowledge  of  these  n 
facts,  pretending  that  we  did  not  yet  know  them,  and  constructing 
a  probability  measure  over  a  frame  that  considers  not  only  the 
question  considered  by  S  but  also  the  question  whether  Ei,...,E,^ 
are  or  are  not  true;  typically  we  construct  this  measure  by 
making  probability  judgments  P(s)  and  P(E2&> • each  s 
in  S.  The  problem  with  this  strategy  is  that  we  now  need  to  look 
for  evidence  on  which  to  base  these  probability  judgments.  We 
have  used  our  best  evidence  up,  as  it  were,  but  now  we  have  an 
even  larger  judgmental  task  than  before.  According  to  the  behav- 
iorist  Bayesian  theory,  there  is  no  problem — it  is  normative  to 
have  the  requisite  probabilities,  whether  we  can  identify  rele¬ 
vant  evidence  or  not.  But  according  to  the  constructive  view¬ 
point,  there  is  a  problem,  a  problem  which  limits  how  far  we  want 
to  go.  We  may  want  to  apply  the  conditional  probability  strategy 
to  some  of  the  Ei,  but  we  may  want  to  reserve  the  others  to  help 
us  make  the  probability  judgments  (see  Shafer  and  Tversky,  1985). 

2.4.  The  Language  of  Belief  Functions.  Whereas  the 
Bayesian  probability  language  uses  canonical  examples  where  known 
chances  are  attached  directly  the  possible  answers  to  the  ques¬ 
tion  asked,  the  language  of  belief  functions  uses  canonical 
examples  where  known  chances  may  be  attached  only  to  the  possible 
answers  to  a  related  question. 

Suppose,  indeed,  that  S  and  T  denote,  respectively,  the 
possible  answers  to  two  distinct  but  related  questions.  When  we 
say  that  these  questions  are  related,  we  mean  that  a  given  answer 
to  one  of  the  questions  may  not  be  compatible  with  all  the  possi- 


-  20  - 


ble  answers  to  the  other.  Let  us  write  "sCt"  when  s  is  an 
element  of  S,  t  is  an  element  of  T,  and  s  and  t  are  compatible. 
Given  a  probability  measure  P  over  S  (I  assume  for  simplicity 
that  P  is  defined  for  all  subsets  of  S) ,  we  may  define  a  function 
Bel  on  subsets  of  T  by  setting 

Bel(B)  B  P{slif  sCt,  then  t  is  in  B} .  (1) 

for  each  subset  B  of  T.  The  right-hand  side  of  (1)  is  the 
probability  that  P  gives  to  those  answers  to  the  question  consi¬ 
dered  by  S  that  require  the  answer  to  the  question  considered  by 
T  to  be  in  B;  the  idea  behind  (1)  is  that  this  probability  should 
be  counted  as  reason  to  believe  that  the  latter  answer  is  in  B. 

We  might,  of  course,  have  more  direct  evidence  about  the  question 
considered  by  T,  but  if  we  do  not,  or  if  we  want  to  leave  other 
evidence  aside  for  the  moment,  then  we  may  call  Bel(B)  a  measure 
of  the  reason  we  have  to  believe  B  based  just  on  P. 

I  call  the  function  Bel  given  by  (1)  the  belief  function 
obtained  by  extending  P  from  S  to  T.  A  probability  measure  P  is 
a  special  kind  of  belief  function;  this  is  just  the  case  where 
(i)  S=T  and  (ii)  sCt  if  and  only  if  s=t. 

All  the  usual  devices  of  probability  are  available  to  the 
language  of  belief  functions,  but  in  general  they  are  applied  in 
the  background,  at  the  level  of  S,  before  extending  to  degrees  of 
belief  on  T,  the  frame  of  interest.  Thus  the  language  of  belief 
functions  is  a  generalization  of  the  Bayesian  language.  I  have 
studied  the  language  of  belief  functions  in  detail  in  earlier 
work--see  especially  Shafer  (1976,1985).  Here  I  will  use  some 
examples  of  (1)  to  illustrate  the  language  and  to  contrast  it 
with  the  Bayesian  language. 


Example  1.  Is  Fred,  who  is  about  to  speak  to  me,  going  to 
speak  truthfully,  or  is  he,  as  he  sometimes  does,  going  to  speak 
carelessly,  saying  something  that  comes  into  his  mind,  but  the 
truth  of  which  he  does  not  know?  Let  S  denote  the  possible 
answers  to  this  question;  {truthful, careless} .  Suppose  I  know 
from  experience  that  Fred's  announcements  are  truthful  reports  on 
what  he  knows  about  80%  of  the  time  and  are  careless  statements 
the  other  20%  of  the  time.  Then  I  have  a  probability  measure  P 
over  S:  P{ truthful }=. 8 ,  P{careless}=.2. 

Are  the  streets  outside  slippery?  Let  T  denote  the  possible 
answers  to  this  question;  T={yes,no}.  And  suppose  Fred's  an¬ 
nouncement  turns  out  to  be,  "The  streets  outside  are  slippery." 
Taking  account  of  this,  1  have  a  compatibility  relation  between  S 
and  T;  "truthful"  is  compatible  with  "yes"  but  not  with  "no," 
while  "careless"  is  compatible  with  both  "yes"  and  "no."  Apply¬ 
ing  (1),  I  find 

Bel ( {yes) )= .8  and  Bel({no})=0;  (2) 

Fred's  announcement  gives  me  an  80%  reason  to  believe  that  the 
streets  are  slippery  outside,  but  no  reason  to  believe  that  they 
are  not. 

How  might  a  Bayesian  argument  using  this  evidence  go?  The 
direct  probability  strategy  would  use  all  my  evidence,  Fred's 
announcement  included,  to  make  a  direct  probability  judgment 
about  whether  the  streets  are  slippery.  But  if  I  want  an  argu¬ 
ment  that  uses  the  judgment  that  Fred  is  80%  reliable  as  one  in¬ 
gredient,  then  I  will  use  a  conditional  probability  strategy. 

This  strategy  requires  two  further  probability  judgments:  (1)  A 


prior  probability,  say  p,  for  the  proposition  that  the  streets 
are  slippery;  this  will  be  a  judgment  based  on  evidence  other 
than  Fred's  announcement.  (2)  A  conditional  probability,  say  q, 
that  Fred’s  announcement  will  be  accurate  even  though  it  is 
careless.  Given  these  ingredients,  I  can  calculate  a  Bayesian 
probability  that  the  streets  are  slippery  given  Fred's  announce¬ 
ment  and  my  other  evidence: 

. 8p  +  . 2pq 

P(  si ippery I  announcement )  =  - .  (3) 

.8p  +  .2pq  +  .2  (l-p)(l-q) 

Is  the  Bayesian  argument  (3)  better  than  the  belief-function 
argument  (2)?  This  depends  on  whether  I  have  the  evidence  re¬ 
quired.  If  I  do  have  evidence  to  support  the  judgments  p  and  q-- 
if,  that  is  to  say,  my  situation  really  is  quite  like  a  situation 
where  the  streets  and  Fred  are  governed  by  known  chances,  then 
(3)  is  a  good  argument,  clearly  more  convincinf’  than  (2)  because 
it  takes  more  evidence  into  account.  But  if  the  evidence  on 
which  I  base  p  and  q  is  of  much  lower  quality  than  the  evidence 
on  which  I  base  the  number  80%,  then  (2)  will  be  more  convincing. 

The  traditional  debate  between  the  frequentist  and  Bayesian 
views  has  centered  on  the  quality  of  evidence  for  prior  proba¬ 
bilities.  It  is  worth  remarking,  therefore,  that  we  might  well 
feel  that  q,  rather  than  p,  is  the  weak  point  in  the  argument 
(3).  I  probably  will  have  some  other  evidence  about  whether  it 
is  slippery  outside,  but  I  may  not  have  any  idea  about  how  likely 
it  is  that  Fred's  careless  remarks  will  accidentally  be  true. 

A  critic  of  the  belief-function  argument  (2)  might  be 
tempted  to  claim  that  the  Bayesian  argument  (3)  shows  (2)  to  be 
wrong  even  if  I  d3  lack  the  evidence  needed  to  supply  p  and  q. 


Formula  (3)  gives  the  correct  probability  for  whether  the  street 
is  slippery  ,  the  critic  might  contend,  even  if  I  cannot  say  what 
this  probability  is,  and  it  is  almost  certain  to  differ  from  (2). 
This  criticism  is  fundamentally  misguided.  In  order  to  say  that 
(3)  gives  the  "correct"  probability,  I  must  be  able  to  convinc¬ 
ingly  compare  my  situation  to  the  picture  of  chance.  And  my 
inability  to  model  Fred  when  he  is  being  careless  is  not  just  a 
matter  of  not  knowing  the  chances — it  is  a  matter  of  not  being 
able  to  fit  him  into  a  chance  picture  at  all. 

Example  2.  Suppose  I  do  have  some  other  evidence  about 
whether  the  streets  are  slippery:  my  trusty  indoor-outdoor  ther¬ 
mometer  says  that  the  temperature  is  31®  Fahrenheit,  and  I  know 
that  because  of  the  traffic  ice  could  not  form  on  the  streets  at 
this  temperature. 

My  thermometer  could  be  wrong.  It  has  been  very  accurate  in 
the  past,  but  such  devices  do  not  last  forever.  Suppose  I  judge 
that  there  is  a  99%  chance  that  the  thermometer  is  working  pro¬ 
perly,  and  I  also  judge  that  Fred's  behavior  is  independent  of 
whether  it  is  working  properly  or  not.  (For  one  thing,  he  has 
not  been  close  enough  to  my  desk  this  morning  to  see  it.)  Then  I 
have  determined  probabilities  for  the  four  possible  answers  to 
the  question,  "Is  Fred  being  truthful  or  careless,  and  is  the 
thermometer  working  properly  or  not?"  For  example,  I  have  deter¬ 
mined  the  probability  .8x.99=.792  for  the  answer  "Fred  is  being 
truthful,  and  the  thermometer  is  working  properly."  All  four 
possible  answers,  together  with  their  probabilities,  are  shown  in 
the  first  two  columns  of  Table  1.  We  may  call  the  set  of  these 


four  answers  our  new  frame  S. 

Taking  into  account  what  Fred  and  the  thermometer  have  said, 
I  have  the  compatibility  relation  between  S  and  T  given  in  the 
last  column  of  the  table.  (Recall  that  T  considers  whether  the 
streets  are  slippery;  T“{yes,no).)  The  element  (truthful, work¬ 
ing)  of  S  is  ruled  out  by  this  compatibility  relation  (since  Fred 
and  the  thermometer  are  contradicting  each  orther,  they  cannot 
both  be  on  the  level);  hence  I  condition  the  initial  probabili¬ 
ties  by  eliminating  the  probability  for  (truthful, working)  and 
renormalizing  the  three  others.  The  resulting  posterior  proba¬ 
bilities  on  S  are  given  in  the  third  column  of  the  table. 

Finally,  applying  (1)  with  these  posterior  probabilities  on 
S,  I  obtain  the  degrees  of  belief 

Bel ( (yes) )=.04  and  Bel( (no) )*.95.  (4) 

This  result  reflects  that  fact  that  I  put  much  more  trust  in  the 
thermometer  than  in  Fred. 

The  preceding  calculation  is  an  example  of  Dempster *s  rule 
of  combination  for  belief  functions.  Dempster's  rule  combines 
two  or  more  belief  functions  defined  on  the  same  frame  but  based 


Probability  of  s 


Elements  of  T 


Initial 

Posterior 

compatible  ' 

( truthful , working ) 

.792 

0 

— 

( truthful , not ) 

.008 

.04 

yes 

(care less, working ) 

.198 

.95 

no 

(careless, not) 

.002 

.01 

Table  1. 

-  25  - 

yes , no 

.  Ok.*  . 


•  j-  ‘ 


e-,  k-.  J 


•  •  ■ •  "j 


on  independent  arguments  or  items  of  evidence;  the  result  is  a 
belief  function  based  on  the  pooled  evidence.  In  this  case  the 
belief  function  given  by  (2),  which  is  based  on  Fred's  testimony 
alone,  is  being  combined  with  the  belief  function  given  by 

Bel({yes))«0  and  Bel( {no) )*.99,  (5) 

which  is  based  on  the  evidence  of  the  thermometer  alone.  In 
general,  as  in  this  example,  Dempster's  rule  corresponds  to  the 
formation  and  subsequent  conditioning  of  a  product  measure  in  the 
background.  See  Shafer  (1985)  for  a  precise  account  of  the 
independence  conditions  needed  for  Dempster's  rule. 

Example  3.  Dempster's  rule  applies  only  when  two  items  of 
evidence  are  independent,  but  belief  functions  can  also  be  de¬ 
rived  from  models  for  dependent  evidence. 

Suppose,  for  example,  that  I  do  not  judge  Fred's  testimony 
to  be  independent  of  the  evidence  provided  by  the  thermometer.  I 
exclude  the  possibility  that  Fred  has  tampered  with  the  thermom¬ 
eter  and  also  the  possibility  that  there  are  factors  affecting 
both  Fred's  truthfulness  and  the  thermometer's  accuracy.  But 
suppose  now  that  Fred  does  have  regular  access  to  the  thermom¬ 
eter,  and  I  think  that  he  would  likely  know  if  it  were  not 
working.  I  know  from  experience  that  it  just  in  situations  like 
this,  where  something  is  awry,  that  Fred  tends  to  let  his  fancy 
run  free. 

In  this  case,  1  would  not  assign  the  elements  of  S  the 
probabilities  given  in  the  second  column  of  Table  1.  Instead,  I 
might  assign  the  probabilities  given  in  the  second  colximn  of 
Table  2.  These  probabilities  follow  from  my  judgment  that  Fred 
is  truthful  80%  of  the  time  and  that  the  thermometer  has  a  99% 


chance  of  working,  together  with  the  further  judgment  that  Fred 
has  a  90%  chance  of  being  careless  if  the  thermometer  is  not 
working. 

When  I  apply  (1)  with  the  posterior  probabilities  given  in 
Table  2,  I  obtain  the  degrees  of  belief 

Bel( (yes) )«.005  and  Bel( (no) )>.95. 

These  differ  from  (4),  even  though  the  belief  functions  based  on 
the  separate  items  of  evidence  will  still  be  given  by  (2)  and 
(5). 

2.5.  Conclusion.  I  would  like  to  emphasize  that  nothing  in 
the  philosophy  of  constructive  probability  or  the  language  of 
belief  functions  requires  us  to  deny  the  fact  that  Baysian  argu¬ 
ments  are  often  valuable  and  convincing.  The  examples  I  have 
just  discussed  were  designed  to  convince  the  reader  that  belief- 
function  arguments  are  sometimes  more  convincing  than  Bayesian 
arguments,  but  I  am  not  claiming  that  this  is  always  or  even 
usually  the  case.  What  the  language  of  belief  functions  does 
require  us  to  reject  is  the  philosophy  according  to  which  use  of 
the  Bayesian  language  is  normative. 


s 

Probability  of  s 
Initial  Posterior 

Elements  i 
compatible  ' 

(truthful, working) 

.799 

0 

— 

(truthful, not) 

.001 

.005 

yes 

(care less, working) 

.191 

.950 

no 

(careless, not) 

.009 

.045 

yes, no 

Table  2. 

-  27  - 


A  ?  ^  a*_ 


S  ' 


••  ,S  vS 


Prom  a  technical  point  of  view,  the  language  of  belief 
functions  is  a  generalization  of  the  Bayesian  language.  But  as 
our  examples  illustrate,  the  spirit  of  the  language  of  belief 
functions  can  be  distinguished  from  the  spirit  of  the  Bayesian 
language  by  saying  that  a  belief-function  argument  involves  < 
probability  model  for  the  evidence  bearing  on  a  question,  while  a 
Bayesian  argument  involves  a  probability  model  for  the  answer  to 
the  question. 

Of  course,  the  Bayesian  language  can  also  model  evidence. 

As  we  have  seen  in  our  examples,  the  probability  judgments  made 
in  a  belief-function  argument  can  usually  be  adapted  to  a  Baye¬ 
sian  argument  that  models  both  the  answer  to  the  question  and  the 
evidence  for  it  by  assessing  prior  probabilities  for  the  answer 
and  conditional  probabilities  for  the  evidence  given  the  answer. 
The  only  problem  is  that  we  may  lack  the  evidence  needed  to  make 
all  the  judgments  required  by  this  Bayesian  argument  convincing. 
Thus  we  may  say  that  the  advantage  gained  by  the  belief-function 
generalization  of  the  Bayesian  language  is  the  ability  to  use 
certain  kinds  of  incomplete  probability  models. 

3.  The  Attempt  to  Use  Probability  in  Production  Systems 

The  field  of  expert  systems  developed  within  artficial  in¬ 
telligence  from  efforts  to  apply  systems  of  production  rules  to 
practical  problems.  And  the  current  interest  in  probability 
judgment  in  artificial  intelligence  began  with  efforts  to  incor¬ 
porate  probability  judgments  into  production  rules.  In  this 
section  I  review  these  efforts  and  relate  them  to  what  we  learned 


in  the  preceding  section  about  the  Bayesian  and  beliel-f unction 
languages. 

A  production  rule  is  simply  an  if-then  statement,  inter¬ 
preted  as  an  instruction  for  modifying  the  contents  of  a  data 
base.  When  the  rule  is  applied,  the  action  specified  by  its 
right-hand  side  is  taken  if  the  condition  on  its  left-hand  side 
is  found  in  the  data  base.  A  production  system  is  a  collection 
of  production  rules,  which  are  repeatedly  applied  to  the  data 
base  either  in  the  same  predetermined  order  or  else  in  an  order 
determined  by  some  relatively  simple  principle.  Production  sys¬ 
tems  were  used  in  programming  languages  in  the  early  1960 's,  and 
they  were  advanced  as  cognitive  models  by  Newell  and  Simon  in  the 
late  1960's  and  early  1970's.  (See,  for  example,  Newell  and 
Simon,  1965,  and  Newell,  1973.)  Such  systems  are  attractive  as 
models  for  intelligence  because  their  knowledge  is  represented  in 
a  modular  way  and  is  readily  available  for  use.  Each  rule  repre¬ 
sents  a  discrete  chunk  of  knowledge.  Such  a  chunk  can  be  added 
to  or  removed  from  the  system  without  disrupting  its  ability  to 
use  the  other  chunks,  and  the  system  regularly  checks  all  the 
chunks  for  their  relevance  to  the  problem  at  hand.  (For  a  fuller 
account  of  production  systems,  see  Davis  and  King,  1984.) 

When  artificial  intelligence  workers  undertook,  in  the 
1970's,  to  cast  various  bodies  of  practical  knowledge  in  the  form 
of  production  rules,  they  found  that  in  many  fields  knowledge 
cannot  be  encoded  in  the  form  of  unqualified  if-then  statements. 
Instead,  probability  statements  seem  to  be  required:  "If  E^, 
E2,...En,  then  probably  (or  usually  or  almost  certainly)  H.”  So 
these  workers  found  themselves  trying  to  use  production  systems 


to  manipulate  probability  judgments. 

Many  tacks  were  taken  in  the  effort  to  use  probability  in 
production  systems,  but  I  would  like  to  emphasize  two  important 
lines  of  development.  One  of  these  begins  with  PROSPECTOR  and 
leads  to  Pearl  and  Kim's  elegant  work  on  the  propagation  of 
Bayesian  probability  judgments  in  networks,  while  the  other  be¬ 
gins  with  the  certainty  factors  of  MYCIN  and  leads  to  the  use  of 
belief  functions  in  hierarchical  diagnosis.  I  will  review  these 
two  lines  of  development  in  turn. 

3.1.  Bayesian  Networks.  The  artificial  intelligence  wor¬ 
kers  at  SRI  who  developed  the  PROSPECTOR  system  for  geological 
exploration  in  the  middle  1970's  thought  of  production  rules  as  a 
means  for  propagating  probabilities  through  a  network  going  from 
evidence  to  hypotheses.  Figure  1,  taken  from  Duda  et  al.  (1976), 
gives  an  example  of  such  a  network;  here  Ei  denotes  an  item  of 
evidence,  and  Hi  denotes  a  hypothesis.  The  idea  is  that  the  user 
of  the  system  should  specify  that  some  of  the  Ei  at  the  bottom  of 
the  network  are  true  and  some  are  false,  or  should  make  probabil¬ 
ity  judgments  about  them,  and  the  production  rules,  corresponding 
to  conditional  probabilities  for  the  links  in  the  network,  should 
propagate  these  probability  judgments  through  the  network  to 
produce  judgments  of  the  probabilities  of  the  hypotheses. 

The  first  thing  the  PROSPECTOR  workers  noticed  about  this 
scheme  was  that  a  Bayesian  calculation  of  probabilities  for  the 
hypotheses  would  require  more  than  conditional  probabilities 
corresponding  to  the  links  and  probabilities  for  the  evidence 
nodes  at  the  bottom;  it  would  also  require  prior  probabilities 


Figure  1. 


for  the  hypotheses  and  the  other  evidence  nodes.  So  they  aban¬ 
doned  the  idea  of  a  pure  production  system  at  the  outset  by 
requiring  that  the  expert  knovledge  in  the  system  should  also 
include  these  prior  probabilities. 

These  workers  retained  from  the  production  system  picture, 
however,  the  idea  that  an  expert’s  knowledge  should  come  in 
discrete  modular  chunks.  They  wanted  to  be  able  to  elicit  from 
the  expert  statements  of  the  form,  "If  Ej  and  Ej  and  ...,  then  Er 
with  probability  p,"  without  constraining  the  expert  as  to  how 
these  statements  should  fit  together.  This  meant  that  they  still 
faced  problems  in  putting  these  chunks  of  knowledge  together  into 
a  calculation  of  the  probabilities  of  the  hypotheses.  Here  are 
three  of  their  problems:  (1)  The  conditional  probabilities 
elicited  may  not  be  sufficient  to  determine  a  joint  probability 


measure  over  all  the  E's  and  H*s.  If  the  expert  is  thinking  in 
terms  of  the  network  in  Figure  1,  for  example,  he  may  give  rules 
corresponding  to  P(E5lE5)  and  P(E5iE9)  but  neglect  or  feel  unable 
to  give  a  rule  coresponding  to  P{E5lE0&E9).  (2)  The  conditional 

probabilities  that  are  given  may  be  inconsistent.  (3)  The 
network  may  have  cycles,  which  will  cause  trouble  when  propaga¬ 
tion  is  attempted. 

These  problems  were  handled  in  PROSPECTOR  in  relatively  ad 
hoc  ways.  Apparently  problem  (1)  was  handled  partly  by  indepen¬ 
dence  assumptions  and  partly  by  max-min  rules  reminiscent  of  the 
theory  of  fuzzy  sets.  Problem  (2)  was  handled  by  formulating 
rules  of  propagation  which  did  not  always  accord  wjth  Bayesian 
principles  but  which  were  insensitive  to  some  kinds  of  inconsis¬ 
tencies.  Problem  (3)  was  handled  by  arbitrarily  rejecting  new 
production  rules  when  they  would  introduce  cycles  into  the  net¬ 
work  already  constructed. 

PROSPECTOR  behaved  in  a  reasonably  intelligent  way.  But  the 
ad  hoc  character  of  its  procedures  made  many  people  ask  whether  a 
similar  propagation  of  probabilities  might  be  carried  out  in  a 
more  thoroughly  Bayesian  way.  This  question  has  been  answered  by 
Pearl  (1962)  and  Kim  (1963). 

As  Pearl  and  Kim  show,  ve  can  make  sense  of  the  independence 
assumptions  needed  to  construct  a  probability  measure  over  a 
network  from  simple  conditional  probabilities  and  we  can  propa¬ 
gate  updated  probabilities  through  the  network  in  a  simple  and 
elegant  way  provided  that  the  network  has  a  causal  interpretation 
and  a  relatively  simple  form;  it  must  be  a  simple  directed  tree 
or  else  a  slightly  more  general  directed  graph  called  a  Chow 


tree.  (Pearl  (1962)  treated  the  case  of  the  simple  tree,  and  Kim 
(1983)  treated  the  case  of  the  Chow  tree.) 

A  Chow  tree  is  simply  a  connected  and  directed  graph  such 
that  there  is  no  cycle  in  the  corresponding  unconnected  graph;  an 
example  is  shown  in  Figure  2.  In  Pearl  and  Kim's  wor)(, nodes  of 
the  tree  correspond  to  random  variables,  and  the  directions  of 
the  links  are  interpreted  as  directions  of  causation.  Thus  each 
variable  is  influenced  by  the  variables  above  it  in  the  graph  and 
influences  the  variables  below  it.  An  observation  of  the  value 
of  one  variable  is  diagnostic  evidence  about  the  value  of  a 
higher  variable  and  causal  evidence  about  the  value  of  a  lower 
variable. 

Once  a  Chow  tree  is  constructed  for  a  problem,  the  construc¬ 
tion  of  a  probability  measure  over  it  and  the  updating  of  the 
measure  are  straightforward.  Given  Kim's  independence  condi¬ 
tions,  which  are  reasonable  in  the  causal  context,  a  measure  over 


Figure  2. 


the  tree  can  be  constructed  from  prior  probabilities  for  the 
topmost  nodes  and  conditional  probabilities  for  all  the  links. 
Moreover,  this  construction  is  straightforward;  there  are  no 
complicated  consistency  conditions  that  the  conditional  proba¬ 
bilities  must  meet.  Once  construction  is  completed,  the  measure 
can  be  stored  and  updated  locally.  At  each  node  we  store  infor¬ 
mation  about  the  conditional  probabilities  corresponding  to  in¬ 
coming  and  outgoing  links,  the  current  probability  measures  for 
the  variable  at  the  node  and  the  variables  at  neighboring  higher 
nodes,  and  likelihood-type  information  from  neighboring  lower 
nodes.  When  the  value  of  a  variable  is  then  observed,  this 
information  can  be  propagated  through  the  network  to  update  the 
entire  probability  measure  in  one  pass.  All  computations  are 
made  locally,  with  each  node  communicating  only  updated  local 
information  to  its  neighbors. 

An  obvious  shortcoming  of  this  elegant  scheme  is  its  re¬ 
striction  to  Chow  trees.  In  few  problems  will  the  causal  rela¬ 
tions  that  we  think  important  take  so  simple  a  form.  Kim  sug¬ 
gests  that  we  might  use  such  Chow  trees  as  approximations  to  more 
realistic  models;  first  elicit  a  probability  measure  on  a  more 
complicated  graph  from  an  expert,  and  then  choose  the  Chow  tree 
that  best  approximates  this  more  complicated  measure  (Chow  and 
Liu,  1968).  This  suggestion  does  not  seem  very  satisfactory, 
however.  We  are  given  no  reason  to  hope  that  the  approximation 
will  be  satisfactory,  and  perhaps  more  importantly,  the  construc¬ 
tive  nature  of  the  initial  probability  measure  is  put  into  ques¬ 
tion.  In  a  Chow  tree  the  initial  probability  measure  can  be 
constructed  from  probability  and  conditional  probability  judg- 


ments  without  concerns  about  consistency,  but  in  a  more  general 
graph  consistency  conditions  will  be  so  complicated  that  it  will 
be  impossible  for  us  to  hope  they  will  be  met  unless  we  pretend 
that  we  are  indeed  eliciting  a  measure  instead  of  constructing 
one. 

Another  obvious  shortcoming  is  the  restriction  to  thoroughly 
causal  models.  Kim  notes  this  problem  as  follows:  "Although 
causal  relationship  is  the  most  important  one  in  situation 
assessment  decision-making,  it  alone  is  insufficient  to  achieve 
an  expert  level  of  performance.  Additional  studies  are  needed  to 
find  ways  of  integrating  causal  relationships  with  other  kinds  of 
relationships  to  infer  more  valid  conclusions." 

In  a  sense,  of  course,  all  evidence  is  causal.  We  can 
always  construct  a  model  that  relates  the  facts  we  observe  to 
deeper  causes  and  also  relates  these  causes  to  the  questions  that 
interest  us.  The  difficulty  is  that  we  may  lack  the  evidence 
needed  to  make  good  probability  judgments  relative  to  such  a 
model.  The  point  that  causal  models  are  insufficient  is  really, 
therefore,  subsidiary  to  the  more  general  point,  made  in  section 
2  above,  that  we  sometimes  lack  the  evidence  needed  for  a  con¬ 
vincing  Bayesian  argument. 

3.2.  Certainty  Factors  and  Belief  Functions.  Though  I  have 
begun  my  discussion  of  the  effort  to  put  probability  in  produc¬ 
tion  systems  with  the  PROSPECTOR  story,  the  work  on  the  MYCIN 
system  for  medical  diagnosis  began  earlier  and  has  been  more 
extensive.  The  story  of  the  MYCIN  effort  has  been  told  in  a 
recent  book  (Buchanan  and  Shortliffe,  1984),  which  includes  ex- 


tensive  discussion  of  the  certainty  factors  that  were  used  by 
MYCIN  and  the  relation  of  these  certainty  factors  to  belief 
funct ions. 

MYCIN  departed  from  the  pure  production  system  picture  by 
using  a  backward-chaining  strategy  to  select  production  rules  to 
apply.  This  means  that  it  selected  rules  by  comparing  their 
right-hand  sides  to  goals  instead  of  comparing  their  left-hand 
sides  to  statements  already  accepted.  If  the  right-hand  side  of 
a  rule  matched  a  goal,  its  left-hand  side  was  then  established  as 
a  goal,  so  that  there  was  a  step-by-step  process  backwards  from 
conclusions  to  the  knowledge  needed  to  establish  them. 

MYCIN  also  differed  from  PROSPECTOR  in  that  the  MYCIN  work¬ 
ers  rejected  at  the  outset  the  idea  that  the  numerical  probabili¬ 
ty  judgments  associated  with  the  rules  could  or  should  be  under¬ 
stood  in  Bayesian  terms.  They  emphasized  this  point  by  calling 
these  numbers  "certainty  factors"  rather  than  probabilities.  And 
they  formulated  their  own  rules  for  combining  these  certainty 
factors. 

In  spirit,  and  to  a  considerable  extent  in  form,  these  rules 
were  quite  like  special  cases  of  Dempster's  rule  for  combining 
independent  belief  functions.  I  would  explain  this  coincidence 
by  saying  that  in  developing  their  calculus  for  certainty  fac¬ 
tors,  Shortliffe  and  Buchanan  were  trying  to  model  the  probabi¬ 
listic  nature  of  evidence  while  avoiding  the  complete  probability 
models  needed  for  Bayesian  arguments. 

In  recent  work  (Gordon  and  Shortliffe,  1984,  1985),  the 
MYCIN  workers  have  taken  a  close  look  at  the  similarity  between 


the  calculus  of  certainty  factors  and  the  language  of  belief 
functions  and  have  asked  how  belief  functions  can  contribute 
further  to  the  MYCIN  project.  They  have  drawn  two  main  con¬ 
clusions.  First,  it  is  sensible  to  modify  some  of  the  rules  for 
certainty  factors  to  put  these  rules  into  more  exact  agreement 
with  the  rules  for  belief  functions.  Second,  the  diagnosis 
problem  that  was  central  to  MYCIN  can  be  understood  more  clearly 
in  terms  of  belief  functions  if  it  is  explicitly  expressed  as  a 
problem  involving  hierarchical  hypotheses. 

The  term  "hierarchical  hypotheses"  refers  to  the  fact  that 
the  items  of  evidence  in  a  diagnostic  problem  tend  to  support 
directly  only  certain  subsets  of  the  frame  of  discernment,  sub¬ 
sets  which  can  be  arranged  in  a  tree.  Figure  3,  taken  from 
Gordon  and  Shortliffe  (1984),  illustrates  the  point.  The  four 
nodes  at  the  bottom  of  this  tree  represent  four  distinct  causes 
of  cholestatic  jaundice;  they  form  the  frame  of  discernment  for 
the  diagnostic  problem.  Some  items  of  evidence  may  directly 


cholestat  ic 
jaund i ce 


hepatitis  cirrhosis 


-  37  - 


V 


1 


'> 


support  (or  directly  refute)  one  of  these  causes  for  a  particular 
patient's  juandice.  Other  evidence  may  be  less  specific.  There 
may,  for  example,  be  evidence  that  the  jaundice  is  due  to  an 
intrinsic  liver  problem,  either  hepatitis  or  cirrhosis.  On  the 
other  hand,  it  is  hard  to  imagine  a  single  item  of  medical  evi¬ 
dence  supporting  the  subset  {cirrhosis,  gallstone)  without  sup¬ 
porting  one  of  these  more  directly;  this  is  reflected  by  the  fact 
that  this  subset  does  not  correspond  to  an  intermediate  node  of 
the  tree. 

This  picture  suggests  that  a  belief-function  argument  based 
on  such  medical  evidence  may  involve  combining  many  belief  func¬ 
tions  by  Dempster's  rule,  where  each  belief  function  is  a  simple 
support  function  focused  on  a  subset  in  the  tree  or  its  comple¬ 
ment.  (A  simple  support  function  is  a  belief  functon  obtained 
from  (1)  when  S  has  only  two  elements  and  one  of  these  is  com¬ 
patible  with  all  the  elements  of  T.) 

Though  the  tree  structure  provides  a  conceptual  simplifica¬ 
tion  of  the  problem,  the  combination  of  simple  support  functions 
corresponding  to  subsets  in  the  tree  and  their  complements  can 
result  in  a  very  complex  belief  function  and  hence  threatens  to 
involve  prohibitive  computation.  Gordon  and  Shortliffe  (1985) 
have  proposed  a  modification  of  Dempster's  rule  for  this  situa¬ 
tion  that  would  involve  less  computation  and  would  often  give 
similar  results. 

I  believe  that  this  modification  can  be  avoided.  Though 
full  computation  of  the  belief  function  resulting  from  Dempster's 
rule  often  would  be  prohibitive,  we  would  seldom  need  full  compu¬ 
tation.  Usually  we  would  need  only  to  identify  subsets  in  the 


genuine  intelligence  to  their  designers  and  users.  Their  de¬ 
signers  will  have  to  design  the  forms  of  probability  argument  for 
the  particular  problem,  and  their  users  will  have  to  supply  the 
probability  judgments. 

4.  The  Construction  of  Arguments 

I  have  emphasized  that  a  genuine  capacity  for  probability 
judgment  in  an  artificial  intelligence  would  involve  both  the 
ability  to  generate  numerical  probability  judgments  and  the  abil¬ 
ity  to  design  probability  arguments.  How  might  these  abilities 
be  programmed?  We  do  not  have  an  answer,  but  we  should  start 
thinking  about  the  question. 

As  the  result  of  the  work  by  psychologists  during  the  past 
decade,  especially  the  work  of  Kahneman  and  Tversky  (see  Kahne- 
man,  Slovic,  and  Tversky,  1982),  we  do  have  some  ideas  about  how 
people  generate  numerical  probability  judgments.  They  conduct 
internal  sampling  experiments,  they  make  similarity  judgments, 
they  construct  causal  models  and  perform  mental  simulations  with 
these  models,  they  consider  typical  values  and  discount  or  adjust 
these,  and  so  on.  An  obvious  and  appropriate  strategy  for  arti¬ 
ficial  intelligence  is  to  try  to  implement  these  heuristics. 

The  heuristics  sometimes  lead  to  systematic  mistakes  or 
biases,  and  it  is  by  demonstrating  these  biases  that  the  psychol¬ 
ogists  have  convinced  us  that  people  use  them.  There  is  a  ten¬ 
dency,  therefore,  to  think  that  people  are  doing  something  subop- 
timal  or  unnormative  when  they  use  them.  Indeed,  proponents  of 
the  Bayesian  philosophy  frequently  assert  that  the  psychological 
work  only  demonstrates  what  people  do  do  and  is  irrelevant  to 


vhat  people  should  do.  Presumably  this  means  that  instead  of 
using  the  heuristics,  they  should  first  realize  that  it  is  norma¬ 
tive  for  them  to  have  preferences  satisfying  Savage's  postulates, 
then  decide  to  pretend  that  they  do  have  such  preferences,  and 
then  try  to  figure  out  what  they  are. 

When  we  face  up  to  the  artificial  intelligence  problem, 
however,  we  see  that  the  heuristics  are  really  all  we  have. 

People  have  to  use  such  heuristics  if  they  are  to  make  quick 
probability  judgments  about  questions  they  have  not  previously 
considered,  and  our  programs  will  also  have  to  use  them  if  they 
are  going  to  be  equally  flexible.  The  challenge  is  to  figure  out 
how  to  use  the  heuristics  well  enough  that  using  them  will  not 
usually  cause  mistakes. 

Implementing  the  heuristics  involves  us,  of  course,  in  all 
the  issues  of  knowledge  representation,  for  we  must  have  a 
flexible  way  of  matching  the  problem  about  which  we  want  to  make 
a  judgment  with  similar  problems  in  our  memory. 

It  is  more  difficult  to  say  anything  about  how  we  might 
build  the  ability  to  design  probability  judgments.  The  lesson 
from  section  3  is  clear,  though:  the  chunks  that  we  try  to  fit 
together  when  we  search  for  a  convincing  argument  must  be  larger 
than  the  chunks  represented  by  probabilistic  production  rules. 

It  is  also  clear  that  the  ability  to  construct  convincing  proba¬ 
bility  arguments  must  include  an  ability  to  evaluate  whether  a 
probability  argument  is  convincing. 

Though  these  questions  are  difficult,  they  should  be  taken 
as  a  challenge  by  students  of  probability.  I  believe  that  proba- 


tree  that  have  high  degrees  of  belief  and  to  compute  these  de¬ 
grees  of  belief.  This  should  usually  be  achievable  by  careful 
computational  strategies. 

Violations  of  the  independence  assumptions  needed  for  Demp¬ 
ster's  rule  may  pose  a  more  important  problem.  It  seems  unlikely 
that  the  uncertainties  involved  in  a  very  large  number  of  items 
of  medical  evidence  will  all  be  independent.  This  does  not  mean 
that  a  belief-function  analysis  will  be  impossible  or  unsatisfac¬ 
tory,  but  it  does  mean  that  a  satisfactory  belief-function  analy¬ 
sis  may  require  modelling  dependencies  in  the  evidence. 

The  two  needs  identified  here,  the  need  for  effective  compu¬ 
tational  strategies  and  the  need  for  models  for  dependent  evi¬ 
dence,  also  arise  in  many  other  contexts. 

3.3.  Conclusion.  The  preceding  look  at  attempts  to  use 
probability  judgment  in  expert  systems  justifies,  I  think,  a 
general  conclusion:  probability  judgment  in  expert  systems  is 
very  much  like  probability  judgment  everywhere  else.  Though  the 
builders  of  MYCIN  and  PROSPECTOR  worked  in  the  context  of  artifi¬ 
cial  intelligence  ideas,  the  thinking  about  probability  judgment 
that  has  emerged  from  this  work  ten  years  later  does  not  have  a 
distinctive  artificial  intelligence  flavor. 

The  general  issues  about  probability  judgment  that  we  iden¬ 
tified  in  section  2  above  all  re-appear  in  the  expert  systems 
work.  In  expert  systems,  as  elsewhere,  probability  judgment  is 
constructive  and  requires  an  overall  design,  it  is  generally 
possible  to  provide  such  a  design  within  the  Bayesian  language, 
but  Bayesian  designs  often  demand  judgments  for  which  we  do  not 

-  39  - 


have  adequate  evidence.  And  belief-function  analyses  often  re¬ 
quire  models  for  dependent  evidence. 

Production  systems  were  attractive  to  the  artificial  intel¬ 
ligence  community  because  these  systems  seemed  to  have  the  flexi¬ 
bility  in  acquiring  and  using  knowledge  that  seems  characteristic 
of  intelligence.  But  it  seems  fair  to  say  that  the  attempt  to 
incorporate  probability  judgment  into  production  systems  failed. 
PROSPECTOR  and  MYCIN  themselves  retained  a  good  deal  of  the 
flavor  of  production  systems,  but  little  of  this  is  left  in  the 
work  of  Pearl  and  Kim  or  in  the  proposal  to  use  belief  functions 
in  hierarchical  trees.  It  appears  that  probability  judgment 
simply  does  not  have  the  modular  character  that  made  production 
systems  so  attractive.  Almost  always,  probability  judgment  in¬ 
volves  not  only  individual  numerical  judgments  but  also  judgments 
about  how  these  can  be  put  together.  This  is  because  probability 
judgment  consists,  in  the  final  analysis,  of  a  comparison  of  an 
actual  problem  to  a  scale  of  canonical  examples. 

(Expert  systems  that  are  based  on  production  rules  without 
probability  judgment  have  been  very  successful;  R1  and  DART  are 
often  cited  as  examples  of  commercial  success.  The  developers  of 
these  systems  have  been  heard  to  say,  however,  that  their  method¬ 
ology  is  best  used  in  problems  where  probability  judgment  is  not 
needed. ) 


I  would  suggest  that  the  expert  systems  we  see  using  proba¬ 
bility  in  the  near  future  are  not  likely  to  have  the  flexibility 
and  judgmental  capacity  that  we  associate  with  genuine  intelli¬ 
gence.  Instead,  these  systems  will  continue  to  leave  the  work  of 


S 


bility  judgment  will  turn  out  to  be  possible  and  important  in 
artificial  intelligence,  but  the  extent  of  its  ultimate  useful¬ 
ness  cannot  be  taken  for  granted;  it  must  be  demonstrated. 


References 

Barr,  A.,  and  Peigenbaum,  E.  A.  (1981)  The  Handbook  of  Artifi¬ 
cial  Intelligence,  Volumes  I  and  II.  William  Kaufmann, 
Inc.,  Los  Altos,  California. 

Box,  G.  E.  P.  (1980)  Sampling  and  Bayes*  inference  in  scientific 
modelling  and  robustness.  Journal  of  the  Royal  Statistical 
Society,  Series  A  143:383-430. 

Buchanan,  B.  G.,  and  Shortliffe,  E.  H.  (1984)  Rule-Based  Expert 
Systems:  The  MYCIN  Experiments  of  the  Stanford  Heuristic 
Programming  Project.  Addison-Wesley ,  Reading,  Massachu¬ 
setts. 

Chow,  C.  K.,  and  Liu,  C.  N.  (1968)  Approximating  discrete  proba¬ 
bility  distributions  with  dependence  trees.  IEEE  Transac¬ 
tions  on  Information  Theory  IT-14:462-467. 

Cohen,  P.  R. ,  and  Peigenbaum,  E.  A.  (1982)  The  Handbook  of 
Artificial  Intelligence,  Volume  III.  William  Kaufmann, 

Inc.,  Los  Altos,  California. 

Cohen,  P.  R.  (1983)  Heuristic  reasoning  about  uncertainty:  an 
artificial  intelligence  approach.  Report  No.  STAN-CS-83- 
986,  Department  of  Computer  Science,  Stanford  University. 

Davis,  R.,  and  King,  J.  J.  (1984)  The  origin  of  rule-based 
systems  in  AI.  In  Buchanan  and  Shortliffe,  pp.  20-52. 

Doyle,  J.  (1979)  A  truth  maintenance  system.  Artificial  Intel- 


Duda,  R.  O.,  et  al.  (1976)  Subjective  Bayesian  methods  for  rule- 


based  inference  systems.  In  AFIPS  Conference  Proceedings  of 
the  1976  National  Computer  Conference,  Volume  45  (New  York), 
pp.  1075-1082. 

Gordon,  J.,  and  Shortliffe,  E.  H.  (1984)  The  Dempster-Shafer 
theory  of  evidence.  In  Buchanan  and  Shortliffe,  pp.  272- 
292. 

Gordon,  J.,  and  Shortliffe,  E.  H.  (1985)  A  method  for  managing 
evidential  reasoning  in  hierarchical  hypothesis  spaces.  To 
appear  in  Artificial  Intelligence. 

Kahneman,  D. ,  Slovic,  P.,  and  Tversky,  A.  (1979)  Judgments  under 
Uncertainty!  Heuristics  and  Biases.  Cambridge  University 
Press. 

Kim,  J.  H.  (1983)  CONVINCE:  A  Conversational  Inference  Consoli¬ 
dation  Engine.  Doctoral  dissertation.  Computer  Science, 
University  of  California  at  Los  Angeles. 

Marr,  D.  (1982)  Vision.  Freeman,  San  Francisco. 

Newell,  A.,  (1973)  Production  systems:  models  of  control  struc¬ 
tures.  In  Visual  Information  Processing,  W.  G.  Chase,  ed., 
pp.  463-526.  Academic  Press,  New  York. 

Newell,  A.,  and  Simon,  H.  A.  (1965)  An  example  of  human  chess 

play  in  the  light  of  chess  playing  programs.  In  Progress  in 
Biocybernetics ,  N.  Wiener  and  J.  P.  Schade,  eds.  Elsevier, 
Amsterdam. 

Pearl,  J.  (1982)  Distributed  Bayesian  belief  maintenance.  Pro¬ 
ceedings  of  the  Second  National  Conference  on  Artificial 


Shafer,  G.  (1976)  A  Mathematical  Theory  of  Evidence.  Princeton 
University  Press. 

Shafer,  G.  (1981)  Constructive  probability.  Synthese  48:1*60. 
Shafer,  G.  (1985)  Belief  functions  and  possibility  measures.  T 
appear  in  The  Analysis  of  Fuzzy  Information,  Volume  1,  J.  C 
Bezdek,  ed. ,  CRC  Press. 


Shafer,  G.,  and  Tversky,  A.  (1985)  Languages  and  designs  for 
probability  judgment.  To  appear  in  Cognitive  Science. 


TRANSCRIPT  OF  ORAL  PRESBNTATICMI  BY  GLENN  SHAFER: 
PROBABILITY  JUDGMENT 

IN  ARTIFICIAL  INTELLIGENCE  AND  EXPERT  SYSTEMS 


DR.  SHAFER:  I  Mould  llko  to  spook  s  llttlo  aoro  broadly  than  tho 
topic  of  boltof  functions. 

I  think  boliof  function  arguaont  is  a  kind  of  probability 
arguBont.  To  understand  what  its  role  is,  or  could  bo»  in  artificial 
intolligonco  and  export  systeas  I'd  like  to  talk  a  little  aore  generally 
also  about  the  role  of  probability  in  artificial  intelligence. 

Here's  a  list  of  topics  I  would  like  to  discuss.  I'd  like  to 
talk  about  what  belief  function  arguaents  are.  and  contrast  thea  with 
the  Bayesian  arguaents.  A  belief  function  arguaont,  as  I  said,  is  a 
probability  arguaont  but  it  is  often  distinct  free  the  Bayesian 
arguaont.  I  would  like  to  talk  about  the  general  philosophy  of  what  1 
call  constructive  probability,  according  to  which  a  probability  Judgaent 
always  involves  a  coaparison  of  an  actual  problea  to  a  scale  of 
canonical  exaaples,  where  the  canonical  exaaples  are  usually  faailiar 
exaaples  froa  the  picture  of  gaaes  of  chance. 

I'd  like  to  talk  about  what  the  role  of  probability  is  in  expert 
aysteas.  1  think  it  often  is  essential,  but  expert  systeas  seea  to 
require  flexibility  which  we  are  not  accustoaed  to  with  probability 
arguaents. 

Finally,  I  would  like  to  talk  about  the  role  of  probability  in 
artificial  intelligence  proper,  as  distinguished  froa  expert  systeas, 
which  I  think  is  an  area  which  has  not  been  explored  very  auch  in  the 
course  of  the  recent  interest  in  the  subject. 

In  fact,  I  think  it  would  be  interesting  to  start  with  the 
question  *why  has  there  been  so  little  probability  in  artificial 
intelligence?" 

Lately  we've  seen  an  explosion  of  Interest  in  the  topic  but  that 
has  to  be  oontrasted  with  the  previous  20  years  tdien  there  was 
practically  none  on  the  part  of  the  artificial  Intelligence  coaaunity. 

Why  was  that?  Hell,  I  think  there  are  soae  basic  historical 
reasons.  First  of  sll,  artificisl  intelligence  began  in  the  1950s,  very 
self-consciously,  by  contrasting  Itself  with  what  coaputers  could  then 
do,  which  was  crunch  nuabers. 

The  idea  was  that  Intelligence  sinrely  involves  soaethlng  aore 
than  that,  soae  kind  of  aore  general  syabolic  aanipulation,  and  so 
nuabera  and  probabilities  in  particular  were  soaething  that  they  were 
not,  by  definition.  Interested  in. 


That  attituda  waa  huttraaaad  by  kiii4  of  a  folk  arguMOt* 
aoaothiat  that  you  4on*t  onoountor  In  print*  at  loaat  vary  oftan*  but 
you  oftan  aneountar  In  talking  to  paopla  In  tha  artificial  Intalliganoa 
ooaaunlty.  1  eall  It  tha  lnput*output  arguaant,  uhloh  aaya  that 
artificial  Intalllganca  la  an  attaapt  to  Inltata  hunan  Intalliganoa  and 
In  tha  oaaa  of  huaan  Intalllganca  tha  Inputs  ara  nonnunarlcal  and  tha 
outputs  ara  noonunarloal. 

What  usa  la  thara  for  nuhbars  In  batuaan?  farhapa  you  could 
phrase  sonathlng  that  goas  on  In  batwaan  In  tarns  of  nunbars*  but  why 
bother? 

Basically  tha  Intalllganca,  as  It  wars,  oust  be  a  progran  of  sona 
kind  for  going  froa  inputs  to  outputs.  Vhy  not  Just  tall  what  that 
progran  really  la  doing?  What  It's  doing  nust  surely  have  nothing  to  do 
with  nunbars,  so  why  tslk  about  probabilities  at  all  In  artificial 
Intalllganca? 

1  think  that  arguaant  la  further  buttressed  by  sona  davalopnants 
within  artificial  Intalllganca  which  present  thansalvas  In  sona  sense  as 
altarnatlvas  to  probability.  One  of  those  Is  tha  Idas  of  nonaonotonic 
logic,  a  different  way  of  handling  u»oartainty. 

Nora  recently  a  fallow  naaed  Paul  Cohan  whs  Is  now  at  U  Naas 
Aaharst  has  talked  about  following  acre  cloaaly  tha  input'output 
arguaant  and  talked  about  giving  explicitly  tha  reasons  or  tha 
andorssaants  that  a  progran  would  need  to  taka  certain  actions  *'*  again 
a  salf-consclous  altarnatlva  to  probability. 

Vhy  tha  currant  Interest  In  probability?  Why  tha  change? 

Wall,  It  Is  pretty  clear  that  tha  causa  of  this  Interest  Is  tha 
expert  systan  idea. 

Tha  artificial  Intalllganca  paopla  get  intarastad  in  using  thalr 
Idass  about  knowledge  rapraaantatlon  to  build  prograns  which  they  would 
call  Intalliganoa  systans  that  would  sake  an  expert's  knowledge 
available  to  a  nonexpert  user,  and  as  soan  as  they  started  looking  In 
that  direction  they  found  that  In  sona  donalns  expert  knowledge  saaas  to 
have  to  involve  probability  Judgaants.  You  can't  aaka  abaoluta  what  tha 
expert  knows  or  is  able  to  tall  you.  It  doesn't  seen  to  be  In  an 
abaoluta  fora.  If  sueh-and'such  than  such-and-such.  It's  only  If 
such-and-such,  than  probably  auch-and-such. 

Since  you  ara  going  hare  on  a  huaan  judgasnt,  you  can  put  huaan 
Inputs,  and  therefore  nuaarical  Inputs,  Into  tha  systaa.  Thus  ws'va 
gotten  sway  froa  tha  assuaptlon.  In  expert  systsas  as  opposed  to  tha 
original  artificial  intalllganca  problaa,  that  tha  Inputs  ara 
noimuasrleal,  and  laaadlataly  have  both  a  need  and  an  opening  for 
probability. 


k»  1  Mid  at  tha  baginnlng,  I  think  of  boliaf  funotloa  arguoants 
aa  a  apooial  kind  of  probability  arguaant.  Why  havo  tho  A1  pooplo  boon 
ao  Intoroatod  in  boliaf  function  arguoMita? 

I  etieo  Tialtod  a  ooaputor  aclonco  dopartMnt  about  a  year  ago  and 
thay  kMw  1  wt  a  atatiatician  ao  thoy  took  aa  upataira  and  introdueod 
M  to  tho  ehairnan  of  tho  atatiatiea  dopartaont.  Aftor  a  littlo  bit  of 
goaaip  ho  turnod  to  aa  vary  gravely  and  Mid*  *Mhy  aro  those  pooplo  ao 
intorMtod  in  what  you  do?" 

Thia  ia  tho  quoation  I  want  to  aak  hora  —  why  aro  thoM  pooplo 
ao  intorMtod  in  bollof  funetiona?  - 

I  think  tho  anawor  haa  to  do  with  tholr  hope  for  a  nodular 
roproMntation  of  probabiliatie  knowlodgo. 

At  tha  tiaa  that  export  ayatoaa  wore  gatting  atartod  in  tho 
1970a.  a  fundaaantal  idea  in  tho  artificial  intolligoneo  eonaunity  (I 
haven't  boon  part  of  that  ooaaunity  and  I  think  there  aro  aono  pooplo 
hero  that  are  and  thoy  aight  ba  able  to  oerroct  no)  waa  that  tho 
floxlbillty  of  hunan  intolligoneo  ai^t  bo  oxplalnod  in  toraa  of  certain 
nodularity  in  tha  roproaontatlon  of  knowlodgo.  Apparently  things  aro 
arranged  in  our  hMda  in  auch  a  way  that  individual  itoM  of  knowlodgo 
can  ba  added  or  roaovod  without  disrupting  tho  whole  syston,  which 
contrasts  vary  strongly  with  what  wo  aro  aecuatanod  to  in  toraa  of 
structured  ooaputor  progranning.  You  can't  taka  one  lino  out  of  a 
FORTRAN  prograa  and  expect  tho  thing  tc  work. 

It  also  contrasts  very  strongly  with  tho  Bayesian  probability 
arguaont.  You  can't  take  one  of  tho  inputs  out  of  a  Bayesian  argunont 
and  expect  the  thing  to  work. 

Sonahow  huaan  intolligoneo  doos  go  that  way.  Tho  structuros  aro 
built  in  such  a  way  that  you  can  got  along  without  any  particular  thing. 
Vlth  tho  talk  about  eonbination  of  ovldonco*  disoroto  itoos  of  ovidoneo 
being  put  together*  and  being  able  to  nMo  JudgMnts  on  tho  basis  of 
linitod  ovidoneo*  belief  functions  sooo  to  offer  aoao  hope  in  that 
direction. 

That's  ay  perception  of  why  this  intorMt  has  cone  up.  1  don't 
know  how  far  I'll  got  this  aorning*  but  I'd  like  to  offer  you  part  of  ay 
conclusion  in  advance. 

Tho  conclusion  is  (a  biased  one*  of  oourso)  that  I  think  belief 
functions  are  aore  flexible  and  aore  aodular  in  soae  respeeta  than 
Bayesian  arguaents*  but  1  don't  think  they  are  as  aodular  as  what  the 
artificial  intelligence  people  were  looking  for  in  the  1970s. 

I  think  that  all  probability  arguaont  involvoa  an  ovorall  dosign, 
in  soaa  MnM.  Me  can't  get  the  kind  of  asdularity  that  the  AI 
eonaunity  was  looking  for  when  thoy  were  building  expert  systoM  based 
on  production  rules.  Hhon  we  really  get  into  probability*  expert 
systoas  are  going  to  look  different  than  thoy  do  now.  There  are  a  lot 
of  export  systens  and  only  soae  of  then  oaae  froa  the  artificial 


-  48  - 


▼ 


I-  -  •  V  L  •  .  •  4.^  ».  %  wTk  -1»  .-  I>  -T»  k.*%  ^Tl  ..  »  J~»  .,•>  .  "•  ■-."■«  .TV.  V.’V  l.-'V 


intelligence  eoaaunity.  The  ones  that  ease  free  the  artificial 
intelligence  coMiunity.  like  DART  and  XCON,  are  largely  baaed  on  the 
idea  of  production  rules  and  do  have  this  aodularity  which  I  don't  think 
we're  going  to  have  when  we  get  successful  systeas  based  on  probability. 

Another  ooncluslon  is  that  in  the  ease  of  probability,  the  expert 
systeas  we  see  in  the  future  nay  not  be  that  strongly  influenced  by 
ideas  coalng  out  of  the  artificial  intelligence  coaaunity  at  the  present 
tlae. 

With  that  beginning,  I'd  like  to  go  to  the  part  of  the  talk  that 
1  was  asked  to  give,  which  is  about  belief  functions. 

To  talk  about  belief  functions  quickly,  I  would  like  to  first 
Introduce  the  idea  of  a  fraae  of  diseernaent,  which  is  a  list  of 
possible  answers  to  a  question.  It's  what  we  are  accustoaMd  to  calling 
a  saaple  space  in  statistics,  though  soaetiaes  it's  a  paraaeter  space, 
any  space  that  we  could  put  a  probability  aeasure  on. 

1  want  to  talk  about  the  idea  of  a  coapatibility  relation  between 
two  fraaes.  Say  S  is  one  fraae  and  T  Is  another  and  we  have  an  eleaent 
little  a  in  S  and  a  little  t  in  T.  Let's  write  sCt  to  aean  that  little 
s  is  coapatlble  with  little  t.  That  aeans  that  little  s  could  be  the 
answer  to  the  first  question  and  at  the  saae  tiae  little  t  could  be  the 
answer  to  the  second  question.  It's  possible  for  s  and  t  both  to  be  the 
answers. 


Let  ae  quickly  in  an  abstract  way  tell  you  what  a  belief  function 
is,  in  case  you  aren't  clued  in. 

The  idea  is  that  you  start  with  a  probability  aeasure  P  (perhaps 
based  on  frequencies  froa  your  experience)  on  S,  and  then  you  get  the 
belief  function  on  the  second  fraae  T,  by  setting  Bel(B)  >  P(s  sCt 
laplies  t  e  B). 

The  ststeaent  here  is  any  probability  on  S  that  can  only  be 
associated  with  answers  to  the  second  question  which  are  in  B  should 
count  as  a  reason  to  believe  that  the  answer  to  the  second  question  is 
in  B. 


I  didn't  say  that  very  saoothly.  Let  ae  see  if  I  have  it 
written  down  any  better. 

Bel  (B)  aeasures  the  reason  we  have  to  believe  B  based  on  the 
probability  in  the  coapatibility  relation  between  S  and  T. 

Well,  that's  only  if  we  didn't  know  anything  else.  The  idea  is 
this  is  the  reason  we  have  based  on  P  and  its  eoapatlbility  relation,  so 
the  idea  is  we  alght  have  soae  other  evidence  that  aore  directly  bears 
on  T  but  if  we  want  to  leave  that  aside  for  the  aoaent  or  if  we  don't 
have  it,  or  if  we  Just  want  to  talk  about  what  support  we  have  for  B 
based  on  the  evidence  auaaarized  by  P  and  by  the  coapatibility  relation 
between  S  and  T,  if  we  Just  want  to  aake  Judgaents  based  on  that 
evidence,  this  seeaa  to  be  the  kind  of  Judgaent  we  want  to  aake. 


-  49  - 


L«t  M  tiv«  an  •xaapl*.  Firad  aoa—  Muntaring  avar  to  ay  dask 
and  ha's  about  to  tall  aa  soaathiag  and  tha  guastion  in  ay  aind  la  la 
this  going  to  ba  on  tha  laval  or  ia  it  Just*  lika  ha  doaa  sonatinas, 
talking  without  paying  attantion.  My  axparianea  is  that  ha's  truthful 
80  paroant  of  tha  tiaa  and  ha'a  earalass  20  paroant  of  tha  tiaa  and  I 
ganarally  can't  tall  what  ha's  up  to. 

Tha  saeond  guastlon  that's  on  ay  aind  is  whathar  tha  straats  ara 
ioy  outsida,  as  sonatinas  hsppans  in  Kansas.  Tha  possibla  answars  to 
that  guastion  ara  yas  and  no,  so  wa  hava  twa  quastions.  Kara  ara  tha 
possibla  answars  to  than  and  1  hava  a  probability  aaasura  on  tha  first. 

I  night  hava  sona  avidanea  nora  diraetly  about  whathar  tha  straats  ara 
toy  but  I  want  to  think  about  tha  situation  whara  ay  avidanea  is  just 
Prad's  tailing  ns  thay  ara  icy.  (saa  Slida  1) 

Hall,  sinoa  I  hava  this  high  eonfidanoa  in  what  Prad  says,  about 
80  paroant,  that  saaas  to  giva  no  soaa  raason  just  in  itsalf  to  baliaf 
tha  straats  ara  icy. 

Mow  what's  tha  eonpatibility  ralation?  Hall,  onoa  Prad  has  told 
as  this,  than  Prad's  baing  truthful  is  ooapatibla  only  with  tha  straats 
baing  icy,  but  Prad's  baing  earalass  is  ooapatibla  with  aithar  ona. 
That's  what  this  says  hara.  That's  our  eonpatibility  ralation.  (saa 
slida  2) 

Lat's  apply  that  foraula  1  wrota  down  to  this  situation.  Mara's 
tha  foraula.  In  this  case  wa  hava  probability  .8  on  truthful,  .2  on 
earalass,  truthful  is  anly  ooapatibla  with  yas,  but  earalass  is 
ooapatibla  with  yas  and  no. 

Hhat  does  this  foraula  tall  us?  Hall,  in  this  casa,  B  just 
consists  of  a  singla  point,  "yas,*  so  hara  I  just  want  all  s's  that  ara 
ooapatibla  only  with  yas  and  that's  this  ona,  so  ay  dagraa  of  baliaf  for 
yas  is  aight-tanths. 

On  tha  othar  hand,  thara  ara  no  s's  that  ara  ooapatibla  only  with 
"no"  bacausa  earalass  is  ooapatibla  with  both,  so  I  don't  hava  anything 
hara.  Hy  dagraa  of  baliaf  for  no  is  xaro. 

So  on  tha  basis  of  Prad's  tastinony  alona,  I  hava  an  aight-tanths 
raason  to  baliava  that  tha  straats  ara  icy  outside  and  xaro  raason  not 
to  baliava  it.  That's  tha  basic  idea  of  baliaf  functions,  (saa  slida 
3) 

In  sooa  Sanaa  a  baliaf  function  arguaant  is  just  a  probability 
arguaant.  Hhat's  going  on  is  that  wa'ra  putting  tha  probabilities  on  a 
space  or  a  fraaa  in  tha  background  and  wa  ara  looking  at  tha 
iaplieations  for  a  different  fraaa  which  aora  directly  interests  us.  So 
in  a  sense,  all  tha  usual  things  you  can  do  with  probability  you  can 
also  do  with  baliaf  functians.  It's  just  that  tha  probability  stuff 
you're  daing  back  hara  and  it's  after  you  gat  that  dona  that  you  than 
lack  at  idiat's  going  on  in  tha  fraaa  of  interest. 


/  '  •  .  '  4  ^  »  WT*  IV  ■ 


This  slso  Mkss  It  olsar  that  bsllsf  funotlons  srs  s 
fsnerslizstioR  of  tho  usual  Bayssian  SBPf'oaeh  to  probability  boeausa  you 
eould  aftor  all  oonaldor  tho  spoolal  oaso  uharo  S  and  T  ara  tha  aaM. 
Than  you'ra  working  dlraotly.  Tha  compatibility  ralatlon  In  that  eaaa 
would  Just  ba  that  an  alaaant  hara  la  oompatlbla  only  with  Itaalf  In  tha 
aaeond  copy  so 

DR.  SlNCfURUALLAt  Why  la  truthful  not  oo^Mtlbla  with  no? 

DR.  SHAFER:  Booausa  of  what  ha  said.  This  la  aftar  tha  fact. 

Ha  said  It’s  ley  outside.  It's  his  atatanant.  Our  knowladga 
aatablishas  tha  odapatiblllty  relation  batwaan  tha  two  fraaaa. 

DR.  DaGROOTt  Since  there  la  a  break,  lat  na  Just  interrupt  for 
one  nomant  and  say  aoaathlng  t  should  have  aald  bafora  wa  started,  and 
that  la  Z  think  during  tha  talk  you  should  foal  free  to  Interrupt  but 
only  for  clarification  quostlons.  lou  should  ba  able  to  do  If  that's 
oksy  with  you  I  think  tha  audlanca  should  ba  able  to  do  that  but  wa'll 
give  you  tha  traditional  spaakar'a  prlvllaga  of  refusing  to  answer 
questions  tiiat  you  don't  Ilka  during  tha  talk  and  aftar  tha  talk  you  no 
longer  have  that  prlvllaga  of  rafualng  to  answer. 

DR.  SHAFER:  Let  us  contrast  tdut  I  was  Just  doing  with  a 
Bayesian  analysis. 

Thera  ara  a  let  of  tha  Bayesian  analyses  for  a  given  problaa  but 
what  would  you  do  If  you  wanted  to  taka  this  alght*tanths  and  two-tanths 
that  I  was  talking  about  and  extend  It  to  a  full  Bayaslan  argunant? 

Vail,  you  would  need  two  things.  First  of  all,  tha  point  that  Is 
most  often  emphasised  whan  we're  contrasting  Bayesian  with  traditional 
statistical  arguments,  you  need  prior  probabilities.  Vhat  Is  your  other 
evidence  for  whether  or  not  It's  ley  outside?  Let's  say  you  have 
probability  p  without  Fred's  testimony,  probability  p  that  It  Is  Icy, 
probability  1-p  that  It  Isn't  ley. 

There's  another  thing  you  need  too  in  order  to  do  tha  Bayaslan 
argunant.  You  need  to  break  down  this  "caralass”  Into  two  cases,  where 
ha's  careless  but  what  ha  says  Is  true,  and  where  ha's  careless  and  what 
ha  says  Is  falsa.  You  have  to  put  that  in  your  probability  model  too. 

So  you  put  q  of  that  two-tenths  into  oareless  but  true,  and  1-q  In 
careless  but  false.  Then  you  can  do  the  Bayesian  thing. 

That  can  be  represented  by  Bayes'  theorem  but  let's  not.  Let's 
say  that  If  you  put  these  Judgments  together  before  you  heard  what  Fred 
said,  S  and  T  were  Independent,  so  you  could  form  the  product  measure  cm 
S  and  T.  Then  you  would  condition  on  the  knowledge  that  you  got  from 
what  Fred  said,  (see  slide  k) 

If  they're  Independent  then  you  multiply  the  8/10  times  the  p  to 
get  the  probability  that  Fred  Is  going  to  be  truthful  and  also  It's  ley 
outside,  et  cetera,  these  numbers  here  represent  the  product  measure, 
and  then  of  course  you  want  to  condition  on  the  fact  that  Fred  does  say 
that  it  la  ley.  In  which  ease  he  can't  be  truthful  at  the  same  time  as 


51  - 


it*s  not  ley  and  nloo  he  onn't  he  saying  aeaethlng  falsa  at  the  saaa 
tlM  it's  not  ley.  at  eetera*  theae  three  things  are  crossed  out  and  you 
r«iomalise  aa  usual  with  oonditienlng  and  so  this  would  he  your 
prohability  for  *yea.  it  is  ley  eutalde." 

Now  this  gives  a  different  answer  in  general  than  the  belief 
function  argunent  for  your  degree  of  helief  that  it's  iey  outside.  Hhat 
it  aotually  would  be  depends  on  what  p  and  g  are. 

The  helief  function  argunent.  oontrasted  with  this,  uses  a  less 
coaplete  probability  nodal. 

Now.  of  oourae.  if  this  probability  node  were  sonehow  oorrect.  if 
things  were  really  developing  by  chance  and  these  really  were  the 
chances,  of  course  this  answer  would  be  right.  Clearly,  the  helief 
function  answer  would  be  wrong. 

There's  a  tendency  to  nay.  well,  naybe  we  don't  know  what  p  and  g 
are,  hut  they  nust  he  sonething.  and  whatever  they  are  it  isn't  likely 
this  foraula  will  agree  with  the  helief  fimctlon  answer,  so  the  helief 
function  answer  nust  be  wrong. 

My  answer  to  that  argunent  is  that  basically  this  chance  thing  is 
not  in  any  sense  what's  really  going  on.  Mist  we're  doing  is  we're 
nodeling  what's  really  going  on  by  oonpariaon  to  the  picture  of  fiance. 

I  nay  refuse  to  go  all  the  way  and  do  the  Bayaaian  analysis.  The 
guestion  is  whether  or  net  I  feel  I  have  the  evidence  to  support  these 
Judgnents. 

The  belief  function  idea  is  that  perhaps  in  sons  situations  we 
can  gain  sore  fleiibility  by  naking  only  oonparisons  to  the  picture  of 
chance  for  which  we  have  enough  evidence.  Sons  other  conpariaons  we 
feel  we  don't  have  any  evidence  for.  Perliaps  there  is  acne  regularity 
to  Fred's  behavior  in  terns  of  whether  he's  being  careless  or  truthful 
but  there  nay  net  be  any  regularity  that  we  can  nake  out  aa  to  whether 
or  not  what  he  says  is  true  when  he's  being  careless  and  we  don't  want 
to  nodal  that  by  oonpariaon  to  a  chance  picture. 

It's  not  as  if  there's  sene  oonpariaon  which  is  right,  or  as  if 
there  are  sene  chances  which  are  right  and  we  Just  don't  know  then. 
Rather,  %fe  don't  want  to  nake  the  conparison  to  the  chance  picture  at 
all. 

With  that  attitude,  we  don't  want  to  nake  a  conparison  we  don't 
feel  is  a  good  oonpariaon.  Nith  that  attitude  this  nunber  here  has  no 
reality.  It's  not  like  there  are  sene  p's  and  g's  we  don't  know.  There 
Just  aren't  any.  If  there  were  and  aonebedy  else  knew  what  they  were, 
we  certainly  wouldn't  want  to  bet  with  that  person  if  we  were  doing  the 
belief  function  analyais,  but  then  ef  course  we  don't  want  to  bet  with 
people  that  know  nore  than  we  do  in  general. 


VOXCIt  I«  It  in  ordnr  to  Mk  ohot  fou  noon  ty  *ohoneo*7 


OR.  SfUFERt  Yooh.  Thoro  oro  thoM  Oollo  nnO  urno  and  die#  and 
things  In  physios.  I'd  Oottar  hurry  through  a  oouplo  of  othor  oxat^los. 

I  Msnt  to  talk  about  oonbining  osidonoo  bocauso  thla  is  osaontial 
to  tho  bollof  funetion  ploturo. 

Lot'a  say  vo  did  havo  aoao  othor  ovldoneo  about  uhothor  It's  ley 
or  not  that  no  uant  to  bring  In.  frod'a  tsotlnony  is  ono  Iton  of 
ovldoneo.  Lot's  say  I  had  another.  Oppose  I  have  an  lndoor*outdoer 
thornonotor  near  ny  desk  and  tho  indoor>outdoor  thomonstar  saya  it's  31 
dogroos  and  I  know  fron  oxporloneo  that  you  ean't  got  loo  on  tho  stroots 
uhon  it's  31  dogroos  with  what  traffle  wo  havo  going  by. 

My  thornonotor  is  very  rollablo*  thoao  things  aro  pretty  rollablo 
dovleos,  and  this  ono  has  boon  protty  woll  eallbratod  far  a  long  tlno. 

1  havo  a  lot  noro  eonfldoneo  In  It  than  1  do  In  Prod  and  It's  an 
argunsnt  for  it  not  being  ley  outside. 

Lot's  say  wo  wanted  to  put  those  two  argunents  together.  Hell* 
again.  I  want  to  havo  ono  apaeo  S  that  I  do  ay  probability  ealeulationa 
on.  keeping  it  distlnot  fron  tho  space  T  oorrosponding  to  tho  quoatien 
X'n  intorostod  in.  whothor  or  not  it's  icy. 

Hors  X'n  Just  working  with  tho  question  as  to  Uhothor  or  not  Prod 
has  boon  truthful  or  earoloss  and  whoUtor  or  net  that  thornonotor  is 
working,  (see  Slide  5) 

I  havo  a  99  poroont  probability  that  tho  thornonotor  is  working, 
ono  poroont  probability  that  naybo  sonothing  is  wrong  with  it. 

Mhat  do  X  do?  Moll,  again  X  nako  a  Judgaont  baok  hors  in  tho 
probability  area.  X  nako  a  Judgnont  of  Indopondonoo,  so  X  oonstruet  a 
probability  noasaro  thoro.  light  tines  .99  is  .792  probability  that 
Prod  is  going  to  bo  truthful  and  tho  thornonotor  is  working. 

This  is  tho  probability  I  would  construct  before  X  hoar  what  Prod 
said  and  see  what  tho  thornonotor  says.  After  I  hoar  what  Prod  says  and 
see  what  tho  thornonotor  says  I  hsvo  to  nako  a  change  because  X  know 
that  Prod  and  tho  thornonotor  aro  oontradioting  each  othor.  This 
possibility  ean't  havo  happened  so  I  olininsto  it;  X  condition  on 
olininating  it  and  ronoraaliso  Just  as  in  tho  Bayesian  story,  except 
hero  X'n  working  in  tho  background  and  not  directly  on  tho  frano  of 
interest. 

Those  aro  tho  four  possibilities  in  tho  background.  Those  aro 
tho  olononts  of  T  that  aro  eonpatiblo  with  then  after  X  got  tho 
knowledge  of  what  Prod  said  and  what  tho  thornonotor  said.  There's 
nothing  thst's  eonpatiblo  with  tho  thornonotor  working  and  Prod  being 
truthful,  bocauso  they  contradicted  each  ether,  (see  slide  6) 


-  53  - 


If  Pr«4  is  trsthful  sad  ths  thsraoMtsr  is  sot  Marking,  than  that 
■Sana  it  is  iey  autsida.  If  Frad  is  baing  oaralasa  and  tha  thamoMtar 
is  Horking.  that  naans  it 'a  nat  ley  autsida  baoausa  tha  thamanatar  says 
it  isn't. 

If  Frad  is  baing  earalass  and  ths  thsmoaatar  is  nat  MM'king,  I 
don't  knov  tdist's  going  an  than.  That  is  ooapatibla  bath  with  it  baing 
ioy  and  uith  it  net  baing  iey. 

These  are  tha  initial  prebabilltias  based  on  that  product 
aaasura.  I  oendition  by  alininating  this  ana  and  ranomalising.  I  gat 
.95  hare  and  .Ok  hare  and  .01  hare  so  ay  degree  of  belief  on  yes  is  tha 
.Ok  and  ny  degree  of  belief  an  no  is  the  .95. 

Tha  point  is  1  had  nora  oonfidaaoa  in  tha  thamoaatar  than  I  did 
in  Frad.  so  in  spite  of  Fred's  tastiaony  I  have  a  dagraa  of  baliaf  of 
.95  in  it's  uorking. 

This  is  an  axaapla  of  Daapstar's  rule  for  eoabining  baliaf 
functions  in  ganaral.  Tha  general  story  is  you  fora  in  Uia  background  a 
product  aaasura.  and  you  condition  it  on  uhat's  iapliad  by  tha 
eoapatibility  relation  batwaan  tha  too  fraaas. 

One  aora  axaapla:  Lot  aa  talk  about  dependant  avidanea.  In  tha 
preceding  axai^la  ua  had  independent  avidanea  in  tha  background,  so  ua 
feraad  a  product  aaasura.  But  in  aany  problaas  ua  don't  have  that  kind 
of  indapandaneo.  Let  aa  Just  tall  a  story  Mhara  ua  don't  sea  uhat  would 
than  happen  with  tha  baliaf  functions. 

Tha  story  is  basically  tha  saaa.  1  sake  a  Judgaant  that  Frad  is 
80  paroant  reliable,  and  tha  tharaoaatar  is  99  pareant  ralisbla.  But 
this  tiae  1  do  not  aaka  tha  Judgaant  of  indapandanca. 

In  tha  first  st«*y.  I  was  ssying.  wall,  aayba  Frad  doesn't  even 
sea  tha  tharaoaatar.  Frad  doesn't  have  anything  to  do  with  tha 
tharaoaatar.  It's  on  ay  desk. 

In  this  story,  lot's  say  there  is  seas  dapandanoa.  Maybe  Frad  is 
tha  ana  that  pays  sera  sttantion  to  tha  tharaoaatar  that  I  do.  and  in 
fact  if  it  weren't  working  it's  likely  that  ha  would  have  known  about. 

Ha  probably  would  have  known  yesterday  that  it  wasn't  working,  and  it's 
procisoly  in  oases  like  this  that  Frad  lets  his  fancy  run  loose. 

If  tha  tharnonatar  had  bean  working,  it's  not  so  likely  that  ha 
would  have  gone  nunbling  around  about  whether  it  was  ley  or  not  but  with 
tha  thamooatar  not  working  ha  foals  ha  sort  of  lets  his  own  nlnd  run 
free  about  what's  going  on.  (sea  slide  7) 

Let's  say  that  there's  a  90  paroant  ohanoa  of  Frad  baing 
unreliable  if  tha  thamoaatar  is  not  working.  Those  three  Judgnants  are 
enough  to  eonstruet  a  probability  distribution.  He  have  a  aarginal 
probability  for  his  ralisbility  and  a  oenditional  probability  far  his 
reliability  and  a  aarginal  probability  for  tha  thamoaatar's 
reliability,  and  these  are  enough  to  datamina  a  probability 


ilstribution  on  on  our  opoeo  of  four  oioaonto  Horn  and  horo  nro  tho 
probabllitloo  that  thay  Ootoralno. 

Hall,  oneo  atain,  aftor  wo  aoo  wtet  Frod  aaya  and  aoo  wbat  tho 
thomoaotor  aaya»  thla  ooapatiblllty  rolation  olinlnatoa  thio 
poaaibllity  and  ao  wo  ronoraallso  Mid  tat  tho  answora  horo*  whleh  aro 
alifhtly  difforont  fron  tho  onoa  wo  had  boforo. 

Tho  .95  ia  tho  nano.  Thoao  aro  tho  flrat  onoa.  Theao  aro  tho 
aoeond  onoa.  Tho  .95  aro  tho  aaao  but  idioroaa  in  tho'firat  eaao  wo  had 
a  .Ot  horo  now  wo  havo  .005.  (aoo  Slldo  8) 

In  thla  partloular  oaao*  thoro  la  roally  no  aubatantial 
dlfforonoo  in  what  tho  probability  iud^onta  aro  but  oonooptually  thoro 
ia  ouito  a  dlfforonoo  in  what'a  goint  on  boeauao  in  tho  firat  eaao  wo 
have  two  itons  of  indopondont  ovidonoo  and  in  tho  aoeond  eaao  wo  had  two 
itoaa  of  ovidonoo  whieh  aro  dopondont. 

Tho  idoa  ia.  aa  I  aaid  oneo  aXroady*  wo  oan  do  anything  —  all 
tho  uaual  Idoaa  of  probability  ineluding  indopondoneo  or  dopondonoo  aro 
thoro  for  ua  to  wmrk  with.  It'a  Juat  that  thoy'ro  going  on  in  tho 
baekground  on  a  difforont  frano  inatoad  of  tho  ono  that*a  diroetly  of 
intoroat. 

So  what'a  going  on  horo?  Moll*  aa  I  aay*  wo  havo  thoao  two 
franoa*  S  and  T.  Now  if  I  wanted  to  oontraat  tho  Bayoaian  with  tho 
boliof  funetion  approaehoa  X  would  aay  that  in  gonoral  tho  Bayoaian 
argunont  acta  aa  if  tho  anawor  to  tho  guoation  of  intoroat  waa 
dotornlnod  by  ehaneo  whoroaa  the  boliof  funetion  argunont  aeta  aa  if  tho 
anawor  to  a  related  guoation  wore  dotornlnod  by  ehaneo. 

A  eonatruetivo  attitude  ia  that  wo  want  to  bo  oxplieit  about  tho 
faet  that  What  wo'ro  roally  doing  (when  wo  nako  a  probability  Judgoont 
of  olthor  kind)  ia  eonparlng  our  aetual  aituation  to  tho  pioturo  of 
ehaneo. 

Nero  proeiaoly*  wo'ro  fitting  tho  aetual  aituation  to  a  aoalo  of 
oanonieal  oxanploa,  a  aoalo  of  difforont  pieturoa  involving  ehaneo* 
whore  tho  nunbora  aro  difforont.  Tho  dlfforonoo  between  tho  Bayoaian 
and  boliof  funetion  otorioa  ia  that  wo'ro  uaing  difforont  aoaloa  of 
oanonieal  oxanploa  to  whieh  wo'ro  oonparing  our  aetual  aituation. 

In  oithor  oaao*  thoro  ia  in  gonoral  a  problon  of  doalgn.  Sinoo 
wo  aro  eoaparing  our  whole  problon  to  a  pioturo  of  ehaneo*  wo  have  to 
have  a  gonoral  doaign  for  how  wo'ro  doing  that. 

In  all  thoao  oxanploa  I've  boon  giving*  wo  didn't  Juat  taka 
individual  Judgnonta  and  put  then  together  in  an  arbitrary  way.  Thoro 
waa  a  gonoral  overall  eooparlaon*  a  gonoral  doaign. 


I'B  trying  to  soy  aoro  things  that  thoro  is  rooa  for  in  ono  talk 
but  lot  M  quiokly  shift  back  to  tha  artificial  intolligoneo  story  and 
try  to  give  a  little  aoro  depth  to  the  eoaaanta  I  uaa  asking  oarlior 
about  production  rules. 

HOrk  in  export  systsas  in  artificial  intelligonea  in  the  early 
'70s  and  on  into  the  late  '70s.  as  I  asntioned  before,  was  largely  based 
on  the  idea  of  production  rules. 

Hhat's  a  production  rule?  It's  Just  an  if-then  stateaent.  Nhy 
was  it  interesting  to  people  in  artificial  intelligence? 

Well,  it  was  interesting  because  it  dees  in  fact  offer  a  way  to 
do  very  unstruetured  prograaaing. 

The  idea  is  that  perhaps  you  could  represent  your  knowledge  by  a 
large  set  of  if-then  rules  —  like  if  there  is  sacks  then  there  is  fire 
—  and  perhaps  you  could  apply  those  rules  to  current  knowledge  in  a 
very  unstructured  way. 

The  idea  is  you  put  your  current  knowledge  in  what  you  would  call 
a  data  base.  So.  for  instance,  in  your  data  base  night  be  the  current 
knowledge  that  there  is  aaoke. 

How  here's  a  very  unstructured  way  the  systen  night  operate. 
Suppose  that  what  you  do  is  you  Just  start  at  the  beginning  and  go 
through  all  your  production  rules  and  with  every  production  rule  you 
check  whether  the  hypothesis  catches  soaething  in  the  data  base. 

If  it  does,  then  you  take  the  oonclusion  and  put  it  in  the  data 
base.  too.  so  when  you  coae  to  this  production  rule  you  say.  aha.  there 
is  saoke  and  yeu  add  to  the  data  base  the  stateaent  that  there  is  fire, 
so  then  you  go  through  the  whole  set  of  production  rules  and  then  yeu 
ooae  back  and  start  over. 

Since  there  are  new  things  in  the  data  base,  you  aay  be  able  to 
do  new  things  the  second  tine  around  Utat  you  didn't  do  the  first  tine 
around  and  if  yeu  have  a  let  of  production  rules  in  fact  you  oan 
construct  very  ooaplicated  arguasnts  in  this  way. 

The  nice  thing  about  it  is  that  you  can  do  that  without  really 
worrying  about  the  details  of  hew  you're  going  to  structure  it.  Yeu  oan 
Just  put  enough  production  rules  in  there  that  the  thing  is  going  to 
work  and  in  fact  it  night  work  even  if  you  took  a  few  out.  Soae  of  the 
production  rules  aight  only  be  helping  you  do  things  aoro  quickly  and 
not  be  essential  for  getting  then  done. 

A  lot  of  the  expert  systeas  that  were  built  in  artificial 
intelligence  weed  this  kind  of  a  setup. 


Of  eeuTM  tiM  MopX*  that  war*  Morklnt  with  this  found  th— mItoo 
trying  to  apply  It  to  donalns  whoro  thoy  oouldn't  roally  put  In  thoao 
ahaoluto  prMuetlon  ruloa.  wharo  all  thoy  oould  aay  woro  thlnga  Ilka 
*prohably*”  So  tha  guoation  bogan  *Xnataad  of  Juat  holng  ablo  to  aay 
If  tharo'a  aaoko  thoro'a  firo*  wo  oan  only  aay  If  thoro*a  anoko  than 
thoro  la  probably  flro."  Hayba  putting  aoao  dagrao  of  probability  horo. 
■ou  can  you  adapt  tha  produotlon  rulo  ayataa  to  that?  So  you  aay 
thorn's  a  probability  thoro  la  aaoko*  and  a  probability  on  tha 
^^uetlon  rulot  than  tfhon  you  put  *tb«‘o  la  flro"  In  thoro  you  aheuld 
havo  aoao  probability  oonnoetod  with  it*  too. 


So  aa  you  go  through*  you  ahould  bo  not  only  drawing  eonelualons* 
but  you  ahould  alao  bo  propagating  probabilltloa. 

Voll*  tha  paoplo  that  warn  doing  tbla  warn  not.  Thoy  knaw 
aoaathlng  about  traditional  idaaa  of  probability  but  thoy  woro  doallng 
with  a  problan  that  waan't  fanillar*  ao  thoy  aakod  how  oan  wa  do 
aonathlng  that 'a  aort  of  aooaptablo  but  followa  thia  idaa  of  propagating 
probabilitios. 

Ona  wall  known  axaapla  ia  tha  PIOSPSCTOR  ayaton.  Thla  waa 
davalopad  In  tha  Stanford  laaaaroh  Inatituto.  It  waa  a  ayatan  for 
gaologloal  axploration*  and  thay  wara  aaeoding  goologlata*  ozpart 
knowladga  with  probability  Judgaanta  attaohad  with  than*  and  thay  draw 
ploturaa  lika  thia.  (aaa  alida).  If  thara'a  a  oartain  kind  of  rook* 
than  thara  ia  likaly  to  ba  aonathlng  olaa*  at  eatara.  Cartain  itana  of 
ovidanea  auggaat  othar  itana  of  ovidanoa.  Thay  draw  ploturaa  of  nata 
Mhara  you  go  through  itana  of  ovidanea  and  avontually  you  got  to  tha 
hypothaaaa  that  intaraat  you.  (aaa  Slida  9) 

You  aaa  tha  linka  would  eorraapend  to  if  *  than  atatananta  that 
hava  probabilitlaa  attaohad  to  than.  Tha  linka  ara  tha  production 
ruloa.  For  paopla  that  ara  aoouatonad  to  thinking  about  probability  tha 
linka  aaan  to  oorraapond  to  aona  aort  of  oonditional  probabilitlaa.  So 
how  ara  you  going  to  uaa  thaaa  linka  to  gat  fron  Judgnanta  down  horo  to 
tha  Judgaanta  up  hara  that  you'ra  intoraatod  in? 

Mall*  tha  firat  thing  that  thay  notiead*  of  oouraa*  ia  that 
Sayoaian  analyaia  raguiraa  aora  Juat  than  oonditional  probabilitlaa. 

You  alao  hava  prior  probabilitlaa. 

Hall*  okay*  than  you  can  hava  your  axpart  giva  you  not  only 
oonditional  probabilitlaa  but  alao  prior  probabilitlaa  on  all  thaaa 
nodaa  and  than  nayba  your  uaar  could  aupply  olthar  tha  foot  that  oartain 
of  thaaa  itana  at  tha  botton  wara  trua*  or  porhapa  aona  now  probability 
Judgnant  that  thay  wara  trua  baaad  on  tha  apaoifio  oaaa. 

You  want  to  taka  thoaa  now*  aithar  oartaintlaa  or  probabilltloa 
at  tha  botton*  and  propagata  than  through  tha  ayaton. 


V'  ' 


%  ^  -v**  '  '  *  *  » 


-  57  - 


Nell,  this  dosan't  fit  vary  wall  with  anything  wa'ra  aeeustoasd 
ta  doing  in  probability  thaopy»  and  this  oana  out  in  tarns  of  aona 
problaas  that  wars  paroaivad  with  what  was  going  on. 

Ona  problan  ean  ba  daaeribad  aa  saying  tha  oonditional 
prababilitias  givan  by  thaaa  links  ara  not  auffieiant  to  datamina  - 
avan  with  tha  prim*  probabilitiaa.  thay  aay  not  ba  auffieiant  to 
datamina  tha  probability  distribution  on  tha  whole  apaea.  They  nay  not 
ba  auffieiant  baeausa  you  have  that  OMiditional  probability  of  this* 
given  this,  and  this  given  this,  and  af  this  given  this*  but  these  are 
only  pair-wise  probability  Judgaants.  lau  deo*t  have  anything  that 
involves  three  tama  so  that  you  ean  gat  tha  joint  probabilities  for 
larger  groups  of  alaaants. 

On  tha  ether  hand,  what  you  do  have  aay  wall  ba  inemiaistant.  If 
people  juat  start  throwing  out  these  oonditional  probability  Judgaanta 
you  aay  net  have  anything  that  is  oonsiatant  with  an  overall  probability 
distribution. 

A  third  problan  is  that  though  there  aren't  any  eyelaa  in  this 
pietura,  if  you  sit  down  and  Just  taka  in  tha  infamatien  that  a 
geologist  is  willing  to  provide,  you  night  have  aeae  eonditienal 
probabilities  going  that  way  and  also  sons  going  this  way  and  if  you 
deeide  you  are  propagating  froa  the  botton  you  night  find  out  that  in 
the  oouraa  of  your  propagation  you're  going  around  in  elroles,  whioh 
doesn't  nake  any  sense  for  what  you're  trying  to  do. 

The  PROSPECTOR  people  dealt  with  these  problaas  in  various  ways 
that  were  sort  of  ad  hoc.  I  aean  obviously  you  ean  deal  with  this 
Inauffieianey  problan  by  asking  various  kinds  of  independence 
assuaptions.  They  did  that. 

Proa  what  they've  published,  or  at  least  froa  what  I've  seen, 
it's  not  clear,  certainly  all  the  details  are  net  clear  to  as,  ta  aona 
extent  I  think  they  used  independence  assuaptions  here.  They  also  aeea 
to  have  used  soae  nax/ain  types  of  rules. 

The  consistency  problea,  well,  they  aolved  that  again,  aort  of, 
by  asking  up  their  own  kind  af  propagation  rules,  which  really  couldn't 
be  squared  eoapletely  with  the  usual  probability  arguaanta,  but  whi^ 
did  gat  away  froa  seas  of  the  oonalsteney  probleas. 

Aa  to  the  cycles,  of  course,  they  Just  put  sons  rules  in  their 
systen  that  said  if  the  geologist  volunteered  soae  inforaation  that  was 
going  ta  create  a  cycle  the  aystea  refuses  ta  accept  it  or,  if  it  does 
accept  it,  then  it  rejects  —  takes  soaething  out  that  was  alraady 
there,  so  you  deal  with  that  Just  by  brute  force. 

Hell,  that  was  interesting  enough  to  people  in  artifieial 
intelligenee  that  a  let  af  people  asked  the  question  ia  there  soae  way 
you  ean  do  this  better,  is  there  soae  way  you  ean  do  an  honast  layesian 
Job  with  this  propagation  of  probabilities  buainaasT  I  think  that 
question  has  been  answered  pretty  wall  by  Judes  Paarl  af  UCLA  and  his 
student  Kin  in  sane  work  they  published  in  Just  the  last  eouple  of  years 


on  tho  proMtntlon  of  protabllltioo.  Thoy*vo  oottlod*  X  think.  Just 
Mhnt  kinds  of  sssuaptions  aro  noodod  for  this  kind  of  promotion* 

Ono  nssiaiptlon  is  that  you  oan*t  havo  a  eyelo.  To  do  it  in  a 
eonaiatont  Bayoaian  vay,  you  havo  to  hava  uhat  is  eallod  a  Chou  troo. 
which  ia  vary  noarly  a  troo.  Your  graph  eannot  have  any  cyolos,  not 
ovan  any  diroetod  eyeloa. 

loro  you  do  have  a  oyola  in  a  sonaa  but  you  can't  follow  tho 
oyolo  the  way  tho  arrows  aro  going.  So  that  waa  okay,  but  in  a  Chow 
troo  you  can't  ovan  have  a  oyolo  like  thia.  Evan  thia  eyelo  was  not 
allowed.  A  oyolo  liko  this  net  allowed,  oven  ono  that  you  can't 
follow  around  if  you  follow  tho  arrows,  so  it's  a  pretty  roatriotod 
systoo.  (see  Slide  10) 

But  if  you  do  assuM  thia  kind  of  a  Chow  troo  and  you  do  put 
individual  oonditional  probability  Judgnonts  hare,  than  you  oan  aotually 

■ako  aonso. 

A  sooond  point,  if  you  intor^t  thia  troo  in  a  eauaal  way  ao 
that  you're  thinking  of  the  things  down  hors  as  being  oausos  of  tho 
things  up  hero,  then  you  can  nako  aonso  of  tho  oonditional  of  tho 
Indopondonoo  assuoptiona  that  aro  roguirod  in  order  to  go  froo  tho 
pair-wiao  Judgnonts  to  a  ooaploto  probability  distribution. 

Also,  you  don't  have  any  oonsiatonoy  problon  ao  ao  natter  what 
thoao  individual  Judgnonts  aro,  they're  eonaiatont  and  they  do  dotomino 
a  probability  distribution. 

Third,  Pearl  and  Eln  have  done  this  very  slogantly,  you  oan 
propagate  tho  probabilities.  Those  nodes  oorrospond  to  randon 
variables.  So  tho  arrows  oorrospond  not  Just  to  a  single  oonditional 
probability  but  to  a  whole  sot  of  oonditional  probabilities,  oonditional 
probabilities  for  this  variable  given  this  ono.  So  if  you  find  out  tho 
value  of  any  variable  you  oan  condition  tho  whole  syaton  on  that,  and  do 
so  by  local  oonputations  in  ono  pass  through  tho  troo.  You  only  have  to 
store  loeally  iafomation  about  the  neighbors  below,  what  probability 
they're  tolling  you,  and  tho  neighbors  above,  what  likelihood 
infomation  they're  giving  you,  and  that  oan  all  bo  done  locally. 

I  think  to  nost  people's  oinda  they  have  ahown  what  oan  bo  done, 
rigoroualy  and  Bayosianly,  in  tho  direction  that  tho  PBOSPBCTOR  people 
wore  trying  to  go. 

VOICE:  Hhon  you  have  a  fairly  oooplioatod  production  systoa,  you 
don't  oven  know  which  way  tho  troo  can  go,  right?  I  ooan  beforehand. 

MR.  SHAPER:  That's  right.  1  think  that's  tho  question  wo  oooo 
to  hero.  Pearl  and  Ein  showed  what  you  oan  do  in  tho  direction  that  tho 
PROSPECTOR  people  wore  trying  to  go,  but  X  think  tho  question  there  is 
what's  left  of  the  production  oysten  idea  that  you  started  out  with,  and 
X  think  the  answer  is  not  auch. 


You  have  to  havo  a  vary  atruoturad  yletura  and  ydo  don't  havo 
that  nodularity  of  knoulodgo  that  you  did  havo.  You  don't  havo  tho 
rioxlbility  and  roproaontatlon  of  knowlodgo. 

Tho  only  floxibillty  you  eon  got  in  In  tho  rioxlbility  your 
ayston  night  havo  in  oonatrueting  trooa  liko  thia  by  intoraoting  with 
tho  uaor,  tho  oxport  or  nonoiport  uaor. 

Xa  thoro  any  diatinetivo  AX  flavor  loft  in  thioT  HOll,  X  don't 
think  ao.  not  diatinetivo  in  tho  aonao  Uiat  thia  pleturo  (that  ia» 
Poarl'a  pioturoa)  ia  oxploiting  work  dono  by  Chow  in  oporationa 
roaoareh.  and  haa  gotten  away  fron  what  waa  diatfnetivoly  artificial 
intoll igonoo. 

Noll,  lot  no  quiekly  look  at  another  oaao  of  what  eano  out  of 
thia  oxport  ayaton  work  in  artificial  intolligoneo  in  tho  '70a.  Tho 
NYCXM  progran.  which  X'n  auro  noat  of  you  havo  hoard  about*  waO  actually 
undertaken  earlier  than  tho  PiOSPBCTOR  progran. 

X  talk  about  it  aoeond  inatoad  of  firat  because  it  involved  noro 
radical  departure  in  tho  beginning  fron  tho  pure  production  rule  ayaton. 

Xnatoad  of  having  this  picture  of  production  rules  whore  you  scan 
tho  data  base  to  chock  whether  thoro  are  any  of  tho  hypotheses  satisfied 
and  then  put  tho  eoneluaions  in  If  they  are  you  go  backwards.  You  start 
with  a  conclusion  that  you  would  liko  to  got  to.  You  scan  tho  data 
base.  As  you  go  through  your  production  rules  you  scan  tho  data  base 
and  see  whether  tho  ocnelusion  in  your  production  rules  natchos  tho 
things  you're  looking  for  in  tho  data  bases.  Tho  data  base  is  now  for 
tho  nonont  what  you  would  liko  to  got  to  instead  of  uhat  you're  starting 
with,  so  you  Just  use  tho  sane  ayaton  going  backwards*  and  eventually 
you  can  nako  tho  aano  kinds  of  argunonts. 

That's  a  departure  that's  not  quite  as  unstructured  as  tho  pure 
production  rule  syston  because  you  havo  to  nako  acne  progranning 
decisions  about  what  conclusions  you're  going  to  try  to  draw. 

Tho  second  way  In  which  they  differed  fron  tho  PROSPECTOR  people 
was  that  fron  tho  outset  they  decided  they  weren't  going  to  try  to  do 
anything  that  was  Bayesian,  that  could  bo  Justified  in  Bayooian  toms. 

They  changed  the  words  and  Instead  of  talking  about  probability 
they  talked  about  certainty  factors  ao  they  had  certainty  factors  in  the 
things  they  kept  in  their  data  base*  and  they  nade  up  their  own  calculus 
for  the  certainty  factors. 

Hell*  the  interesting  thing  fron  the  point  of  view  of  belief 
functions*  of  oourse*  ia  these  rules  they  nade  up  turned  out  to  be  very 
close  to  the  rules  for  belief  functions*  very  close  to  special  eases  for 
denonatrationa  of  the  rules  of  conbinatien  for  belief  functions. 


IvMtually  tiMM  folks  fot  vory  iBtorsstod  In  bollof  functions 
snd  looked  st  the  cssos  uhors  tholr  rules  differed  froa  the  belief 
function  rules.  In  tenersl*  they  seen  to  hsve  drsun  the  ooncluslon  thst 
the  belief  funetlon  rules  uere  better,  snd  thst  they  should  reosst  their 
systen  In  terns  of  belief  functions. 

Z'n  referring  here  to  sone  recent  uork.  Shertllffe  wss  the  guy 
In  whose  thesis  mCIM  wss  orlglnslly  developed.  Gordon  Is  s  student  of 
his  snd  they  hsye  recently  written  s  couple  of  pspers  In  which  they 
pushed  the  belief  funetlon  Ides. 

In  s  psrtlculsr  oontext.  they  ssy  thst  when  they  look  beck  st 
their  erlglnsl  dlsgnostlc  problea  (they  were  dealing  with  aedleal 
diagnoses),  when  they  look  st  the  particular  diagnostic  probleas  they 
were  considering  they've  decided  now  thst  those  probleas  reslly  had  nore 
of  a  hiersrohisl  structure  than  they  used  to  think  they  did  ' 
hlersrchlsl  in  the  sense  that  diseases  fora  s  diagnostic  tree.  X  think 
you  get  the  Ides  without  ay  talking  very  such,  (see  Slide  11) 

You  can  break  a  general  disease  down  Into  nore  specific  diseases, 
and  then  Into  yet  aore  specific. 

For  instance.  Jaundice  night  be  broken  into  a  kind  of  Jaundice 
that  cones  froa  an  Intrinsic  liver  problea  or  cones  froa  sons  problea 
eutalde  the  liver;  Intrinsic  liver  probleas  like  hepatitis  or  cirrhosis, 
things  outside  the  liver  that  could  cause  the  liver  to  nalfunction  or 
gallstones  or  probleas  with  the  pancreas. 

They  had  the  Idea  that  perhaps  the  kinds  of  evidence  they  have 
could  be  dealt  with  in  terns  of  belief  functions  and  those  belief 
functions  would  represent  evidence  that  directly  supported  or  refuted 
certain  subsets  in  this  tree. 

The  Idea  is  you  night  hsve  sone  test  that  told  you  yes  or  no  the 
Jaundice  does  seen  to  be  the  result  of  an  Intrinsic  liver  problea.  That 
would  Indicate  evidence  directly  for  this  hypothesis  which  Is  equivalent 
to  the  set  oonslstlng  of  these  two  hypotheses.  The  belief  funetlon  Idea 
Is  appealing  because  with  belief  functions  you  can  represent  having 
certain  support  for  soaethlng  here,  without  being  specifically  for 
either  ef  these. 

The  hlerarchial  tree  business  corresponds  to  the  feet  they  don't 
think  that  you're  likely  to  get  anything  specifically  that  would  support 
the  subset  consisting  of  cirrhosis  together  with  gallstones.  They  don't 
see  how  you  would  get  that  kind  of  aedleal  evidence  because  those  two 
don't  go  together  naturally. 

So  In  general.  If  you  have  a  tree  the  eleaents  at  the  bottea  are 
your  saaple  space  or  your  fraae  of  dlsoernaent.  You  want  to  have  the 
final  probability  Judgaents  on  subsets  of  the  terainsl  nodes  but  only 
soae  of  those  subsets  oorrespond  to  Interaedlate  nodes.  They  think  that 
you  would  start  with  belief  functiens  that  were  focused  on  these 
interaedlate  nedes  and  then  you  could  eoablne  then  by  bsapster's  Rule, 
et  cetera. 


Hall,  I  ha?*  two  oonnf  ea  this.  On*  is  that  Gordon  and 
Shortliffs  ara  vary  eonearnad  about  tha  ooaputatlonal  eoaplaxitiaa  that 
raault  froa  Daapstar'a  rula  whan  you  do  this.  Thay  aaka  soaa 
suggaatlana  for  aodlfying  Daapstar'a  rula  and  thay  gat  aoaathlng 
ooaputat tonally  aora  faasibla. 

I  think  wa  ean  avoid  that.  I  think  that  if  wa  do  axploit  tha 
traa  struotura  wa  oan  find  ways  to  affteiantly  ealeulata  tha 
probabilttiaa  or  tha  dagraaa  of  baliaf  that  wa  naad. 

It's  trua  that  if  you  try  to  ealeulata  dagraaa  of  baliaf  for  all 
aubaata  down  hara,  you  gat  into  aoaathing  that  is  unfaasibla.  If  you 
raeognixa  that  what  you  raally  want  is  Just  dagraaa  of  baliaf  for  tha 
aubaata  ganarally  in  tha  traa  whieh  do  hava  high  dagraaa  of  baliaf  and 
you  want  to  idantify  thosa  subjaeta  with  high  dagraaa  of  baliaf  and  find 
out  what  thosa  dagraaa  of  baliaf  ara,  I  think  you  ean  do  things 
affieiantly  frea  a  eoaputatienal  point  of  view  and  you  ean  usa 
Daapstar'a  rula. 

I  think  tha  aaeond  thing  to  say  is  that  in  this  pietura  you  ara 
likaly  to  raally  naad  aodals  for  dapandant  avidanea. 

*V." 

Considering  aoaathing  that  ai^t  give  you  a  aadieal  test  raault, 
a  aadieal  test  that  aight  give  you  soaa  poaitiva  avidanea  that  tha  livar 
ia  involved  diraetly  eoaparad  to  soaa  other  test  that  aight  be  aagativo 
apaeifieally  for  paneraatie  cancer,  tha  uneartaintiaa  involved  in  those 
two  testa  aay  not  be  indapendant. 

So  it's  likaly,  it  aaaaa  to  aa,  that  if  you  hava  a  large  nuabar 
of  itaaa  of  aadieal  avidanea  you're  going  to  naad  aodals  for  dapandant 
avidanea  in  this  pietura  aa  wall  as  Just  tha  aodal  for  indapendant 
avidanea  that  tha  Daapstar'a  Rula  is  oorrasponding  to. 

Also  I  would  ask  cha  saaa  questions  hara  as  1  did  in  tha  ease  of 
tha  PROSPECTOR  work.  After  you  go  this  far  and  you  find  out  that  you're 
raally  dealing  with  hiararohieal  hypotheses,  you  find  out  you  want  to 
ooabina  baliaf  functions,  you  find  out  Uiat  you're  going  to  hava  to  hava 
soaa  aodals  for  dapandant  avidanea,  what's  happened  to  tha  aodularity  of 
your  knowledge  and  tha  flexibility  that  you  started  with  idian  you  ware 
dealing  with  tha  produetioo  systaa  idea?  I  think  you've  gattan  quite  a 
ways  away  frea  that. 

My  general  conelusion  froa  this  pietura  '  it's  a  little  harsh  > 
but  I  think  it's  trua  that  tha  effort  to  put  probability  into  ^oduetion 
systaas  failed. 

Tha  reason  it  failed,  one  way  of  putting  it,  is  that  chunks  of 
probabilistic  knowledge  ara  bigger  than  individual  production  rules. 


-  62  - 


That'*  not  as  roToaling,  Z  think*  as  this  tiajr  of  putting  the 
prohlsB.  The  point  Z  was  uking  uhen  Z  was  talking  about  oanonioal 
exaaples  and  oonstruotive  probability  involving  ooaparison  to  oanonioal 
esaaples  is  that  probability  Judgaent  always  involves  design*  because  it 
does  involve  an  overall  eoaparison  of  your  problea  to  oanonioal 
exaaples. 

Another  oonolusion*  is  that  probability  Judgaent  in  expert 
systeas  is  aueh  like  probability  Judgaent  on  other  probleas*  that  the 
saae  kinds  of  diffioulties  arise.  Z  think  you  get  the  saae  kind  of 
oontrast  between  Bayesian  and  belief  function  arguaents.  1  haven't  had 
tine  to  go  into  that  in  teraa  of  exaaples. 

In  those  little  exaaples  1  was  showing  you*  we  saw  an  advantage 
of  belief  functions  over  Bayes  in  that  with  belief  functions  you  could 
use  less  coaplete  aodels.  Z  think  that  the  sane  kind  of  flexibility  is 
present  in  the  artificial  Intelligence  probleas  end  does  give  a  reason 
for  preferring  belief  functions  in  soae  probleas. 

At  the  saae  tlae*  in  these  little  exaaples  we  saw  that  the  belief 
function  arguaents  soaetines  do  require  aodels  for  dependent  evidence. 
That  saae  kind  of  thing  arises  with  the  expert  systeas  problea. 

Another  oonolusion  1  think  wo  have  to  draw  is  about  the  future  of 
expert  systeas  that  use  probability.  1  don't  think  we  can  expect  it  to 
involve  distinctively  artificial  intelligence  ideas. 

Though  the  tera  "expert  systeas*  began  in  artificial 
intelligence*  once  they  defined  an  expert  aystea  as  a  systea  that  can 
have  expert  capabilities*  it  becaae  clear  there  are  a  lot  of  systeas 
that  began  outside  the  artificial  intelligence  ooaaunity  that  are  fully 
ooapetitive  in  that  respect.  Artificial  intelligence  clearly  has  s 
problea  now  in  redefining  itself  and  deciding  what  part  of  expert 
systeas  is  going  to  reaain  a  part  of  artificial  intelligence. 

Having  aade  these  negative  ccaaents  about  artificial 
intelligence*  I'd  like  to  end  with  the  eoaaent  that  Z  think  we  should  be 
very  interested  in  what  could  be  done  with  probability  in  sore  genuine 
artificial  intelligence  probleas.  As  1  said*  apparently  the  reason 
artificial  intelligence  has  gotten  interested  in  probability  was  because 
they  got  swsy  froa  the  picture  of  nonnuaerlcal  inputs  because  they  were 
talking  about  systeas  with  nuaorical  inputs  froa  huaans.  But  leaving 
aside  expert  systeas*  what  about  the  arguaent  that  there  shouldn't  be 
any  probability  in  artificial  intelligence  if  you're  dealing  with 
nonnuaerioal  input? 

1  think  the  field  of  artificial  intelligence  has  outgrown  that 
arguaent.  Z  think  artificial  intelligence  is  no  longer  defined  in  teras 
of  its  oontrast  with  nuaber  crunching. 


Th«r«  !•  a  lot  of  oophiotioation  about  tha  loaala  of  axplanatlon 
Idoa.  that  you  oould  uao  probability  Idaas  avan  though  you  waran't 
atarting  with  nuaarical  inputa.  You  eould  laagina  a  ayataa.  lika 
paopla,  which  ganaratas  probability  aatiaataa  itaalf  and  uaaa  thoaa  and 
eoabinaa  than  in  varloua  waya.  Though  I  don't  hava  tiaa  to  talk  about 
it  and  1  wouldn't  hava  vary  auoh  to  aay  if  I  did*  I  think  an  laportant 
araa  for  paopla  in  probability  to  think  about  ia  tha  ganuina  artificial 
intalliganca  problaat  how  do  you  usa  probability  in  a  ganuina 
intalliganca  ayataa  that  uaaa  nonnuaarioal  inputa,  and  whara  do  tha 
nuabara  ooaa  froa.  1  think  tha  anawar  ia  in  what  tha  payohologiata  hava 
givan  ua,  aoaa  of  tha  anawara  in  tha  kinda  of  haurlatica  that  huaana  uaa 
and  praauaably  aaehinaa  would  hava  to  uaa,  too. 

Whara  do  tha  daaigna  com  froa?  Kara  1  think  wa  hava  to  hava 
biggar  aodulaa  than  production  rulaa  but  thara  la  atill  a  lot  to  work  on 
and  think  about. 


Thank  you  all 


II  Ills  HIW  TO  IrCM  TMTHruLLT  00  CMCLIIII.T? 

i  •  ITOUTNFIM..  CMCUMi 
Ml  THI  iTMITi  ICT? 

T  •  Ites,  ml 


imillKE  llVCi  HI  <  raoiAiaiTT  HIUME  cm  t: 
PCTIUTHFUt.)  *  .1 
P(CWELESl)  •  .2 


II  FMD  MIM  TO  HIM  TOUTHIullT  0*  CMIUIILT? 

I  •  ITMTHFUI..  CMELIIII 
Ml  TM  ITIEETI  ICV? 

T  •  ITEI,  HO) 


PKO  WTI,  *Th|  ITIEETI  ME  ICT.* 

THIl  nUTfi  I  COHMTIIIUTV  MUTIOH  MTHUI 
I  I  T. 

-  TIUTHFUE  II  COMITIIU  HITH  TCI  MT  HOT  HITH  HO 

-  CMEUtt  II  COMITIILC  HITH  TEI  MD  HITH  HO 


(Slide  1) 


(Slide  2) 


Iel(i)  •  ijsllT  |(T.  THEN  T  t  l} 
I  •  {riUTHFOL.  CMElESS  ) 

I  ' 

T  •  (  ns,  NC| 


fcL(rii)  -  .S 
Bel(no)  •  0 


Biyisian  Ahliylil 

•  •  {riirrHFjL,  cmeleii  my  tide,  cmeleii  dot  falii} 

•'  L  .Zo  .2(1-0)  . 

^  I  ■'  ^ 

.2 

T  •  (yes,  hoI 

P  (1-A) 

FOIH  FIOOUCT  HEAIUIE  EC  I  H  T,  THE  CONDITION. 

(TNUYHFOL,  yes)  (careless  MT  TME.  TES)  (CAIELESS  MT  FALSE,  NO) 


.a 

.2o^ 

.211-0)11-0 

P(ris)  - 

♦  .Joo 

.«F  ♦  ,2of  .  ,2(1-o)(1-f) 

(Slide  3) 


T  •  lrts<  «| 


CwUmM  IVINKI'. 

U)  FKD't  TKTimwv  fit 

U)  iNiiwoncu*  M 

t]  •  (TUTWUL.  UMUttI 

t]  ■  (HMiiM.  aorl 

•  ■  ‘1  «  *2 


•  {(TMnHFUL,  WMIM).  (THUTHFUL.  MT). 
.1  «  .99  •  .792  .8  K  .01  >  .008 

(CMIUlt,  HMtlM).  (CMlLItt,  HOT)} 

.2  «  .99  •  .198  .2  X  .01  •  .002 


tUVXTt  «r  T 
C0IMTIU.I 


INITIAL 

NOtTCI 

IfmtMFUL.  MMIW) 

.792 

0 

(fOVIHFUL.  OOf) 

tit 

.008 

.01 

(CMELIM.  aOUXIIK) 

m 

.198 

.99 

(CMELttt.  OOT) 

m*  NO 

.002 

.01 

bL(fEi)  •  .M 
•  .99 


tfPflOEaT  (VIMKE 


HiwtTEx’t  Hull  or  Cotaiwfieo 


■  FOEO  801  IKklUU 
-  THiinaraTix  991  U11M1.E 
'  FMO  MS  901  CMNCE  OF  lEllie 
IMOELIULE  IF  TNE  TWHFOMTEX  It 

nor  WMiw 


8el(m))  -  .95 
BeeCtis)  •  .0* 


(Slim  7) 


(Slide  8) 


DISCUSSION  ON  PNESBNTATION  OP  GLENN  SHAPtl 


DR,  DaOROOTt  Th«r«  ar«  tifo  lnvlt*d  disoMMats  for  tli«  papdrs  at 
this  tlMi  Profassor  Art  Daapatar  froa  Harvard  Uni varsity  and  Staphan 
Matson  froa  Caabrldga. 

Z*d  Ilka  to  osll  on  Art  Daapstar  for  his  eoaaants. 

DR.  DBHPSTER:  Z  aay  not  ba  tha  bast  parson  to  opan  up  tha 
discussion  hara,  slnea  I  approach  it  froa  a  vary  dlffarant  point  of  viaw 
froa  aoot  paopla  who  probably  hava  aany  quastfons  abaut  what  Glann  has 
baan  talking  about.  1  hopa  thara*s  lots  of  tins  for  thosa  quastlons  to 
ooaa  up  and  clarifications  to  ba  asda. 

DR,  DaGROOT:  You  don't  hsvo  to  usa  your  full  tan  ainutas. 

OR.  OEHPSTBR:  I  probably  will. 

(Laughtar.) 

I  think  ay  rola  aay  ba  a  llttla  aora  to  ralnforoa  soaa  of  tha 
things  that  Glann  has  baan  saying  and  to  eoaplaaant  a  faw  of  thaa  by 
throwing  out  a  faw  dlffarant  kinds  of  l<toas. 

Ona  of  tha  things  that  wa  should  ba  trying  to  do  at  this 
confaranca  Is  bridge  various  language  gaps.  Dlffarant  fields, 
engineering  and  AI  and  statlstlos,  do  tend  to  speak  dlffarant  languages 
even  whan  talking  about  tha  sane  things  so  perhaps  wa  can  learn  each 
other's  language  to  soaa  extant. 

Ona  of  tha  things  that  to  no  Is  aoat  appealing  about  probability 
and  belief  functions  Is  that  behind  it  are  soaa  vary  nice, 
straightforward,  axtraaaly  slapla  aathaaatlcs  which  provides  ona  with  a 
calculus,  so  ona  can  operate  within  a  aathaaatlcal  fraaework  which  is 
perfectly  dafinlta. 

That's  true  of  tha  Bayesian  systaa  and  true  of  tha  generalisation 
or  tha  weakening  to  belief  functions.  For  aa,  ay  own  aotivation  In  all 
this  is  such  aora  to  ba  aovlng  toward  practical  things,  aoving  toward 
specific  aodals  that  paopla  can  usa. 

I've  always  felt  that  Glenn's  approach  is  a  aarvalous  idea,  but 
wa  need  to  gat  In  there  and  develop  aodals. 

I  think  ona  of  tha  nice  things  about  tha  belief  function  approach 
to  expert  systaas  Is  that  It's  a  vary  rich  field  of  application  and  ona 
can  do  ooaplioatad  things  or  slapla  things.  Tha  applied  side  of  It,  as 
I  understand  It,  Is  really  Just  developing; 


So  X  aa  IntarMtad  and  tiava  gottan  aora  Intaraatad  la  puahlag 
raaaareh  In  thia  fiald.  X  hava  tha  advantaga  of  a  vary  good  atudant  oho 
la  now  worklag  on  it*  Auguatlna  Kong*  who  la  hara  in  tha  rooa  aoaaplaea. 
X*n  aura  Auguatlna  would  Ilka  to  tall  ua  about  aoaa  of  hla  thlnga  If 
thara  la  tlaa  or  lt*a  approprlata. 

Ona  oan  taka  aoaa  of  thaaa  traa  atrueturoa  that  Glann  haa  baan 
talking  about*  aoaa  that  don't  hava  aa  aany  raatrletlona  on  than  aa  tha 
Chow  traaa  and  ao  on*  and  oan  davalop  ballaf  function  arguaanta  and 
aodala  and  try  to  work  out  aoaa  of  tha  difficult  ooaputatlanal  problaaa 
aaaoeiatad  with  it.  That'a  tha  aort  of  approach  that  Auguatlna  and  X 
ara  trying  to  davalop. 

Thara  ara  aajor  taehnieal  problaaa  thara*  apaelfylng  aodala  and 
davaloplng  algor ithaa  that  oan  work  In  any  kind  of  raaliatlo  eoaputar 
tina.  Only  than  will  wa  ba  abla  to  toat  thaaa  aodala*  taat  tha  ayataaa* 
gat  aoaa  faaling  for  whathar  thay  oan  atand  up  undar  eritleiaa  and  ao 
on.  For  aa  It'a  not  ao  aueh  a  aattar  of  gattlng  tha  axloaa  right  and 
tha  oaleulua  right  and  ao  forth.  That'a  aort  of  all  thara.  Ma  naad  to 
uaa  it  and  gat  aoaa  axparianoa  with  it. 

Ona  thing  that  Glann  aantionad  at  tha  baginning  waa  ha  waa 
wondaring  why  waan't  probability  in  AX*  aay  froa  tha  baginning*  ar  aueh 
aora.  That  praaptad  aa  to  think  that*  wall*  raally  probability  ian't  in 
anything  auoh.  Tha  aducatad  paopla*  avan  tha  aost  taohnically  aduoatad 
paopla*  ganarally  don't  think  in  toms  of  probability. 

X  apant  a  waak  last  auaaar  raading  in  aoonoaia  thaory  and  trying 
to  undaratand  what  aoonoalo  thaory  had  to  do  with  aoonoaioa.  I  don't 
raaaabar  aueh  about  what  1  laarnad  but  X  eaaa  away  with  ona  vary  atrong 
inpraaaian.  That  although  baaio  to  tha  thaory  waa  tha  idaa  that  paopla 
ara  out  thara  with  thair  own  axpaetationa  and  thay'ra  naxinising  than* 
you  would  navar  find  tha  word  "probability”  in  tha  indax.  Ho  thought  at 
all  whara  probabilitiaa  oaoa  froo  which  undarly  thaaa  thinga. 

Xn  ona  fiald  aftar  anothar  it'a  lika  that.  Xn  atatiatiea*  ainoa 
Mthaoatioal  atatiatioa  ia  ao*oallad  objaetiva  probabilitiaa  (which  froo 
ona  point  of  view  ara  Juat  kind  of  a  way  of  avoiding  tha  whola  idaa  of 
probability)*  ao  AX  ian't  diffarant  in  that  ragard.  X  think  tha 
practical  and  of  AX  ia*  aa  Glann  aaid*  forcing  aooa  real  oonoarn  about 
probability. 

X  think  tha  atatiatiea  profaaaion  ia  to  blaaa  to  aooa  axtant. 

It'a  not  Juat  that  paopla  aran't  faailiar  with  atatiatiea*  but  thara'a  a 
lot  of  eonfuaion  aaong  atatiatieiana  tdiieh  prooaada  to  thair  downplaying 
probabilitiaa. 

Ona  of  Glann'a  thaaaa  ia  that  AX  ia  not  naeaaaarily  tha  hooa  for 
thinga  lika  axpart  ayataaa*  V‘  at  laaat  oooplataly. 


Om  thing  that  X*va  netload  atartlng  to  road  into  aono  aapoeta  of 
AI,  oapoeially  In  tho  layoaian  aroa*  la  that  thoro  aro  a  lot  of 
garallala  batwaon  tho  vay  atatlatieiana  look  at  a  yroblan  In  aona 
ovorall  May  in  tha  way  thoao  paoplo  ara  doing  it. 

Thoy'ro  talking  about  lavala  of  analyaia.  Thoy  atart  out  doing 
thinga  that  mo  would  think  of  in  atatiatiea  aa  boing  oxpl«‘atory  data 
analyaia  at  a  low  lovalt  what  la  boing  pareolvod  and  raeordod  and  ao  on. 
Than  thoy  havo  highor  lavala  whara  thM‘a  ara  rafaranoaa  back  to 
knowlodga  baaaa  and  thinga  of  that  aort,  which  ara  nuoh  nora  in  tha 
apirit  of  doing  Bayaaian  infaranea  with  nodala  that  hava  baan 
aatabliahad  with  axploratory  or  othar  nora  intuitiva  waya  of  doing 
thinga. 

Tha  kind  of  aathodolagy  that*a  avolving  thara  haa  ita  parallala 
in  tha  fiald  of  atatiatiea.  Anothar  fiald  that  ia  involvad  ia  tha  fiald 
of  daalgn  baeauaa  thara  ara  alwaya  cuaatiwia  in  axpart  ayatans  aa  to 
what  knowlodga  you  ara  going  to  go  out  and  try  to  bring  into  tha 
analyaia. 

Thara  ia  kind  of  unfortunata  doubla  uaa  of  tha  word  "doaign” 
hara.  Mhan  Glann  uaao  ■daalgn.*  that*a  a  tarn  that  I  think  eana  up  in 
hia  work  with  Tvaraky.  which  ia  daaigna  for  haurlatiea  in  naklng 
probability  judgaanta. 

Tha  othar  kind  of  daaign,  daaigning  what  to  pull  out  of  tha 
availabla  knowlodga  with  linitad  raaauroaa  for  eollaeting  data,  that'a 
aanathing  that  thara'a  a  lot  of  trark  dona  in  atatiatiea  which  ahould  in 
principle  relate  to  thaaa  axpart  ayatan  applleationa. 

!*■  not  aa  big  on  haurlatiea  aa  Glann  la.  Z  hava  kind  of  an 
instinct  that  our  knowlodga  haa  to  eana  fren  aapirieal  baaaa  to  a  largo 
extant  if  wa  want  to  oonnunioata  with  each  other.  X  do  look  to  thla 
kind  of  intagratad  picture  of  statistics  as  data  analysis,  and  nodaling. 
and  cycling  back  and  forth,  and  doing  Bayesian  infaranea  in  all  of  these 
things. 

X  think  thst  that  is  tha  nain  aourea  of  tha  probabilitiaa  that 
wa'ro  going  to  want  to  use  in  fomal  atatiatiea. 

Let  na  step  thara. 

(Applause.) 

01.  OaOiOOTi  Stephan  Natson. 

OR.  VATSOVx  X  don't  want  to  taka  up  too  nuch  of  your  tina 
baeauaa  X  think  neat  of  you  will  want  to  be  talking  youraalvas  about 
thaaa  iaauas.  X  think  tha  rale  af  a  disousaant  la  nayba  to  be 
provocative  and  aay  thinga  which  pa^la  would  disagraa  with  nora  than 
anything  alaa. 


Iv«r  aiM*  I'va  com  late  teu^  tilth  ballaf  fonetlon  thaory  X*va 
found  it  faaelnatlng  and  a  vary  intaroatlng  naw  dovampnant.  It  lad  na 
to  ooM  to  lota  of  diffarant  oonolualona.  Savaral  of  than  tiara 
rainforoad  thla  nornlng  by  Glann'a  talk,  uhleh  I  thought  uaa  axtraaaly 
Intaraoting,  and  full  of  thlnga  to  talk  about. 

Thara  ara  Juat  ttio  polnta,  houavar,  I  Mould  Ilka  to  ahara  with 
you  thla  aoming  oonoarnlng  ballaf  fuMtlona  and  thalr  uaa  la  AX 
ayataaa.  Thaaa  ara  tha  phlloaophloal  aupport  for  tha  thaory  and  tha 
notion  Of  Indapandanea  in  tha  thaory,  tdiloh  aaan  to  no  to  ba  erueial  to 
our  undaratandlng  of  thla  thaory. 

I  taka  ona  of  tha  rolaa  of  thla  oonfaranoa  to  ba  a  dlaouaalan  of 
tdiat  diffarant  kind  of  oaloulua  ona  ought  to  uaa  In  axpart  ayatana. 

Xt'a  elaar  thara  ara  oonpatlng  elalna  for  thla.  Thara  la  tha  Bayaaian 
elala  which  Olann  talkad  about  and  tha  groblana  with  that.  Than  thara 
nay  ba  an  altarnatlva,  or  nayba  It'a  Juat  a  diffarant  gloaa  on  tha  aana 
thing,  tha  ballaf  function  thaory. 

Ona  of  tha  polnta  which  Glann  nada  In  tha  papar  that  ha  praparad 
for  thla  eonfaranea,  oonearna  tha  phlloaophloal  aupport  for  tha  whola 
Idaa  of  ballaf  functlona,  and  ralatad  to  that  tha  phlloaophloal  aupport 
for  probability  thaory. 

How  thara  ara  aena  paopla,  aona  In  thla  vary  roon,  who  will  aay 
that  tha  only  logically  aupportabla  way  of  handling  uncertainty, 
wharavar  It  appaara,  la  to  uaa  tha  oonoapt  of  Bayaaian  probabllitlaa. 
That  what  wa  nuat  do  la  to  atta^it  to  aaa  how  wa  can  gat  ovar  tha 
problana  of  oonplaxlty  which  vlaa  In  trying  to  apply  than  In  practical 
AX  ayatana.  That  any  othar  thaory,  fuxxy  aat  thaory,  ballaf  function 
thaory  or  whatavar,  la  phlloaophlcally  unacoaptabla. 

Bow  I  think  that'a  an  unacoaptabla  viaw.  It  aaana  to  no  that  tha 
phlloaophy  of  aubjaotiva  probability  la  at  boat  an  articla  of  faith. 

Thara  ara  lota  of  raaaona  to  auggaat  that  wa  aa  Individuala  do 
not  handle  uncertainty  In  our  own  nlnda  according  to  tha  rulaa  of  tha 
Bayaaian  oaloulua.  What  1  find  ao  exciting  about  ballaf  function  thaory 
la  that,  aa  1  undaratand  It,  hare  la  another  franawark  for  thinking 
about  uncertainty,  looking  at  a  diffarant  aapaet  of  the  way  wa  naturally 
praaant  uncertainty. 

Mowavar,  it  laada  na  on  to  auggaat  that  what  wa  acadanlca  need  to 
do  la  dovalop  a  naw  phlloaophy  which  ralataa  to  tha  ballaf  function 
idaa,  rather  than  atartlng  out  with  tha  philosophy  of  aubjaotiva 
probability  aa  being  tha  given  aa  our  baaia  for  undaratandlng  thaaa 
things.  That's  tha  first,  Z  hope,  contentious  point. 

Tha  saoond  point  oenoarna  Indapandanoa.  X  suspaot  that  at  a 
oonfaranoa  Ilka  thla  thara  will  ba  aona  people  who  know  Shafor'a  thaory 
Intlnataly,  and  others  who  have  heard  and  ara  Intarastad  In  it  but 
haven't  really  studied  It.  X  fall  soaowhara  between  those  two  axtranaa. 
Ona  of  tha  things  which  grabbed  ay  attention  in  studying  Shafer's  thaory 
la,  as  I  undaratand  It,  tha  oonoapt  of  indapandanoa  la  not  particularly 


wall  advancad 


Clann  aay  disagraa  with  aa  hara  and  ha  knows  battar  than  anybody 
alsa  how  wall  tha  oonoapt  of  Indapandanea  has  baan  advancad. 

Siapla  applications  of  tha  Oaapstar  rula  for  eoabining  avldanca, 
as  Glann  said,  only  apply  to  Indapandant  piacas  of  avldanca. 

Tha  quastlon  Is  whan  ara  two  piacas  of  avldanca  Indapandant  and 
Uhen  ara  thay  dapandant.  If  thay  ara  dapandant,  what  do  you  do? 

I  think  thara  Is  as  yat  no  satisfactory  theory  for  dealing  with 
that  part  of  tha  subject. 

DR.  DaGROOT:  Let  aa  open  up  tha  proceedings  now  to  discussion 
froa  the  floor. 

DR.  SINGPURMLLA:  I'll  raise  a  quastlon  both  to  Glann  Shafer  and 
to  Stave  Watson. 

Glann  aantlonad  tha  concept  of  indapandanea.  To  aa,  indapandanea 
can  only  be  understood  within  tha  notion  of  subjective  probability. 

Whan  you  refer  to  indapandanea,  what  is  it  that  you  have  in  aind? 

DR.  DaGROOT:  You  want  to  taka  a  second  to  answer  that?  I  think 
it  would  be  helpful. 

DR.  SHAFER:  I  don't  know  how  auch  I  can  say  that's  Msaful  but 
tha  general  philosophical  point  of  view  that  I  was  trying  to  advance  a 
while  ago  is  that  what  wa'ra  doing  whan  wa'ra  aaking  probability 
Judgaant  is  that  wa'ra  coaparing  an  actual  problaa  to  a  picture  of  gaaas 
of  chance. 

What  you're  saying  is  we  do  understand  tha  picture  of  gaaas  of 
chance.  You  say  indapandanea  is  a  probabilistic  notion.  Another  way  of 
saying  that  is  what  wa  do  whan  wa'ra  talking  about  dice  or  perhaps  about 
protons  and  photons:  wa  do  know  %fhat  indapandanea  Mans. 

DR.  SINGPURWALLA :  I  can  only  understand  indapandanea  using  tha 
calculus  of  probability.  Is  that  what  you're  referring  to  whan  you  say 
indapandanea? 

DR.  SHAPER:  Yas,  and  no.  I  an  referring  to  indapandanea  as  wa 
understand  it  in  tha  calculus  of  probability.  If  you're  saying  you're 
taking  a  practical  problaa  where  tha  calculus  of  probability  is  not 
there  yat  but  you're  aaking  probability  JudgMnts,  you're  constructing  a 
probability  arguaant.  You're  coaparing  tha  practical  problaa  to  tha 
picture  of  chance.  Than  you  seen  to  have  to  aaka  soaa  kinds  of 
intuitive  JuCgaants  that  that  picture  of  indapandanea  fits. 


DR.  OeGROOT:  It  sounds  like  a  topic  that  may  be  a  recurring 
event  in  this  conference. 

DR.  SINCPURWALLA:  Can  1  make  another  comment,  please? 

DR.  DeCROOT:  You're  paying.  Go  ahead. 

(Laughter) 

DR.  SINCPURWALLA:  The  other  thing  you  mentioned  pertains  to 
those  hierarchlal  trees.  I  Just  want  to  draw  your  attention  to  the  fact 
that  in  reliability  we  draw  what  are  called  fault  trees  and  trace, 
because  the  events  of  interest  are  failures,  the  causes  of  failure.  The 
calculus  of  probability  has  been  used  there  quite  satisfactorily.  The 
main  difficulties  there  happen  to  be  computational.  The  tree  gets  very 
big  and  the  question  is  how  much  time  does  it  take  to  compute  the  top 
event.  Of  course  the  question  of  dependence  and  Independence  (as  I 
understand  it)  arises  there  too:  for  that  we  use  probabilistic  models 
for  dependence,  and  these  are  not  readily  available.  There  are  of 
course  few  models  for  dependence. 

DR.  SHAFER:  My  only  comment  is  yes,  it  seems  to  me  those  are 
problems.  There  must  also  be  a  problem  about  whether  you  know  enough  to 
fill  in  the  details  in  all  the  conditional  probabilities  and 
probabilities  in  the  tree. 

DR.  SINCPURWALLA:  That  is  the  difficult  part.  We  have  to  assign 
conditional  ones. 

The  idea  is  that  you  start  with  a  very,  very  low  level  in  the 
hierarchy  where  you  can  assign  probabilities  based  on  whatever  reasons 
that  you  may  have,  experimentation  etc.  Then  you  build  up  the 
conditional  probabilities  and  you  build  up  the  higher  level 
probabilities  by  using,  again,  standard  calculus  of  probability. 

The  Rasmussen  report  is  an  example  of  where  this  kind  of  thinking 
was  used.  I'm  wondering  if  the  artificial  intelligence  community  and 
the  expert  system  community  has  seen  those  kind  of  things,  or  have  those 
kind  of  Issues  entered  into  that  particular  scheme  of  things? 

DR.  SHAFER:  I  think,  yes,  certainly  that  is  a  widely  applicable 
idea,  as  applicable  to  diagnosing  what's  wrong  with  a  car  as  it  is  to 
diagnosing  what's  wrong  with  the  human  body. 

I  think  the  use  of  those  diagnostic  trees  is  fairly  late  in  this 
artificial  intelligence  story  I'm  talking  about.  That  hasn't  been  a 
central  theme;  its  only  in  some  of  this  recent  work  of  Gordon  and 
Shortliffe  where  that  came  out. 

DR.  DeGROOT:  Let  me  Just  interject  one  technical  question  that  I 

have. 


It  SMMKi  to  M  in  your  dofinitlon  horo  using  this  ooapstibility 
rslstionship  that  ths  bslisf  function  would  include  those  S's  that  were 
not  coapstible  with  any  T's  under  your  definition. 

Does  that  aake  sense? 

DR.  SHAPER:  Right.  Part  of  the  definition  of  coapatible 
relations  should  be  that  for  every  S  there  is  a  T  that's  coapatible  with 
it. 


When  you  do  the  rule  of  coabination  that  gets  violated  and  that's 
why  you're  going  back  and  eliainating  soae  of  those  S's  by  conditioning 
on  the  original  space  S. 

DR.  VISE:  Z'a  Ben  Vise  of  Carnegie-Mellon  University. 

Wien  Steve  Matson  aentioned  ^ilosophical  underpinnings  of  Bayes 
versus  belief  functions,  one  of  those  underpinnings  is  decision  theory 
and  how  if  you  actually  use  s  Bayesian  decision  rule  you  do  ainiaize 
your  expected  loss. 

I  was  wondering  do  you  have  any  analog  to  decision  theory  based 
on  belief  functions  and  an  arguasnt  ccaing  out  of  that,  that  decisions 
based  on  belief  functions  will  actually  be  good. 

DR.  MATSON:  Could  I  Just  interject  there  and  say  that  when  1 
talked  about  philosophical  underpinnings  I  was  thinking  of  philosophical 
theories  of  probability  particularly  and  not  a  decision  theory  concept 
based  on  expected  loss. 

Expected  loss  is  either  a  notion  which  is  ad  hoc  or  it's  derived 
frea  utility  theory  and  probability  theory  which  is  axloaatlzed  in  lots 
of  different  ways,  and  I  was  thinking  of  an  axioaatization  for  belief 
function  theory  or  any  other  theory  which  eight  be  different  froa  that 
of  probability  theory. 

DR.  SHAFER:  I  do  think  that's  a  very  interesting  area.  I  could 
refer  you  to  an  unpublished  paper  that  I've  written  on  the  subject  if 
you're  interested,  but  aainly  it's  concerned  with  the  critique  of  the 
Bayesian  underpinning  decision,  of  the  Bayesian  utility  Justifications, 
rather  than  on  any  very  great  progress  in  a  positive  direction. 

DR.  MISE:  If  I  aay  elaborate,  do  you  have  an  idea  of  even  how 
you  would  evaluate  a  decision  rule  based  on  belief  functions  if  you 
can't  use  expected  loss? 

DR.  SHAPER:  Hell,  if  you  look  at  the  aatheaatics  of  belief 
functions,  the  probabilities  are  going  on  in  the  space  in  the 
background,  and  the  points  in  that  apace  in  the  background  are 
translated  as  subsets  in  the  space  in  the  foreground. 


74  - 


Now  if  you  want  to  aoah  that  with  a  utility  Idaa,  1  think  tha 
natural  question  to  ask  is  why  should  the  utility  be  attached  to  points 
in  the  space  in  the  foreground? 

Perhaps  the  utility  Itself  should  be  attached  to  subsets.  I 
think  that  you'll  get  a  sore  interesting  aesh  if  you  take  that  approach. 

It  does  sees  to  ae  that  there  is  very  strong  arguaent  against 
Art.  Thats  the  line  that  kind  of  interests  ae.  I  don't  know  ^w  far  we 
can  go  with  that. 

DR.  DeGROOT:  Again.  1  think  this  raises  an  issue  that  will  ooae 
up  aany  tlaes  today  and  toaorrow,  naaely  what  is  the  operational  aeaning 
of  belief? 

My  own  view  is  that  the  purpose  of  an  expert  systea.  or  indeed 
any  inforaation  processing  systea,  is  ultiaately  to  aake  decisions  and  I 
know  how  to  use  probabilities  in  declsion'aaklng.  I  don't  know  how  to 
use  beliefs  in  decision-asking. 

Again,  1  think  that  will  be  undoubtedly  expressed  again  and  again 
through  the  next  few  days  and  we'll  hear,  I  hope,  aany  different  and 
interesting  answers  to  that. 

Are  there  other  eoaaents?  David  Spiegelhalter. 

DR.  SPIEGELHALTER:  Just  a  note,  really.  Just  to  point  out  what 
was  said  about  the  deaands  of  probability  aay  lose  the  aodularity 
production  rule.  I  think  that  that  has  also  been  realised  that  that's 
not  suitable  by  people  working  in  artificial  intelligence  on  aatters  not 
concerning  probability  but  on  aatters  of  explsnation  and  control.  The 
idea  of  having  loose  production  rules  which  you  can  Just  plug  in  and 
take  out  has  largely  disappeared  and  such  aore  structured  knowledge 
bases  now  are  becoalng  the  nora. 

DR.  SHAFER:  Ne  see  that's  true  of  McDeraott  snd  people  that  are 
pushing  like  the  XCON  or  DART.  This  aay  not  be  the  best  thinking  that's 
going  on  in  artificial  intelligence,  but  it's  still  a  very  strong 
strand. 

DR.  SPIEGELHALTER:  I  was  thinking  auch  aore  in  terns  of  the 
structure  in  CADUCEUS  and  structures  in  NYCIN  and  the  control  of  the 
aeta  language,  neta  control  that  the  Stanford  group  now  has  gone  over  to 
largely  and  the  idea  of  a  totally  nodular  unstructured  systea. 

DR.  DeGROOT:  Professor  Zadeh,  do  you  have  anything? 

DR.  ZADEH:  I'd  like  to  oonaent  on  a  question  that  was  raised  by 
Hr.  Wise  and  that  is  that  personally  I  prefer  to  use  the  terns  upper  and 
lower  probabilities  to  belief  and  plausibility. 


I  think  that  whan  you  aaaoeiata  tha  tars  baliaf  with  lowar 
probability*  you  tand  to  raad  aora  into  it  than  raally  ia  Justifiad*  so 
that  if  you  look  at  it  in  tarns  of  uppar  and  lowar  probability*  if  you 
look  in  tarns  of  probability  bounds*  that’s  basically  tha  point  if  you 
taka  in  Danpstar's  original  papar. 

Than  tha  answar  to  tha  quastion  that  was  posad  is  that 
oorraspondingly  than  you  will  hava  uppar  and  lowar  axpactad  valuas. 
That's  what  you  will  hava.  In  othar  words*  whatsvar  valuas  you  oonputa 
basad  on  ineonplata  infomation  will  ba  in  tarns  of  bounds  and  it  will 
ba  than  up  to  tha  daeision  analyst  to  daeida  what  to  do*  if  all  you  know 
is  that  tha  axpactad  valua  lias  batwaan  alpha  and  bats. 

DR.  OaCROOT:  My  thought  about  that  connant  is  that  tha  argunant 
against  probability  is  that  it's  so  difficult  to  specify  praclsa 
probabilities  that  one  raally  can't  use  than  in  a  practical  way  in 
large-scale  and  inportant  problans.  Tha  suggestion  tharafora  that  they 
should  ba  replaced  by  uppar  and  lowar  bounds  saans  to  ba  saying  it's  too 
hard  to  specify  a  single  nunbar*  tharafora  wa'll  specify  vary  precisely 
two  nunbars  —  an  uppar  and  lower  bound.  I  find  that  vary  difficult  to 
accept. 

DR.  YAGER:  I  Just  want  to  naka  one  connant  about  that  idea  of 
nodularity  you  ware  talking  about  -  losing  nodularity. 

Perhaps  what  nay  happen  is  that  you'll  hava  —  instead  of  it 
being  totally  nodular  as  it  ia  now*  you  nay  hava  sona  chunks  of 
infornatlon  that  are  sort  of  clunped  together  by  sona  sort  of 
probabilistic  infomation  so  instead  of  tha  whole  thing  being 
independent  you'll  hava  sort  of  groups  of  infomation  being  independent. 

DR.  SHAFER:  Spiagalhaltar  nay  hava  nora  to  respond  to  that  than 

I  do. 

DR.  SPIECELHALTER:  Those  are  highly  structured  groups  of  nayba 
up  to  tan  nodes  with  full  probability  distribution  defined  on  these,  and 
yet  each  of  these  is  essentially  Independent. 

DR.  DaGROOT:  Wall*  it  is  tine  for  coffee,  I  think.  I  would  like 
to  thank  everyone.  It  saans  to  na  that  one  conclusion  I've  gotten  out 
of  this  is  that  adherents  of  tha  Bayesian  approach  are  called  Bayaslans 
and  to  than  tha  rest  of  tha  world  is  non-Bayasian,  and  1  gather  that 
adherents  of  baliaf  functions  are  believers  and  tha  rest  of  tha  world  is 
nonball avers.  (Laughter). 

(Recess.) 


76  - 


FUZZY  SETS  AND  POSSIBILITY  THEORY 


Lotn  A.  Zadeh 
Conputer  Science  Division 
University  of  California.  Berkeley 


Presentation  was  based  on  aaterial  published  previously: 

L.  A.  Zadeh,  "The  Role  of  Fuzzy  Logic  in  the 
Management  of  Uncertainty  in 
Expert  Systems."  Fuzzy  Sets  and 
Systems  II,  (1983)  199-227. 

L.  A.  Zadeh,  "A  Simple  View  of  the  Dempster-Shafer 

Theory  of  Evidence."  Berkeley  Cognitive 
Science  Report  Series,  University  of 
California,  Berkeley  096^). 

L.  A.  Zadeh,  "Fuzzy  Sets  and  Information  Granularity," 
Advances  in  Fuzzy  Set  Theory  and  Applica 
tions,  M.  M.  Gupta,  R.  K.  Ragade, 

R.  R.  Yager  (editors).  North  Holland 
Publishing  Co.,  (1979K 


TRANSCRIPT  OF  ORAL  PRESENTATION  BY  LOTFI  ZADEH: 
FUZZY  SETS  AND  POSSIBILITY  THEORY 


DR.  ZADEH:  To  place  Mhat  I  have  to  aay  in  the  proper 
perspective,  let  m  say  soaethinc  about  ay  perception  of  hou  the  theae 
of  this  workshop  fits  into  soae  of  the  issues  in  the  ease  of  expert 
systeas. 


If  you  look  at  the  dilution  of  scientific  progress  in  various 
fields,  you  observe  the  following.  You  always  start  with  a 
deteralnistic  theory.  Gradually  ~the  realisation  develops  that  knowledge 
in  that  field  is  not  really  deteralnistic.  The  next  stage  is  to  go  to  a 
probabilistic  description.  Eventually  it  turns  out  that  your  knowledge 
of  probability  is  incoaplete.  So  then  you  add  incoapleteness  to  it. 

And  this  is  really  the  situation  we  have  in  the  case  of  expert  systeas. 

The  knowledge  that  is  in  the  knowledge  base  of  an  expert  systea 
typically  is  incoaplete. 

The  question  then  is  how  do  you  ooae  to  grips  with  the  problea  of 
incoapleteness.  The  Bayesian  approach,  as  I  understand  it,  is  to  say 
that  there  is  no  such  thing  as  incoapleteness. 

^  r 

Professor  Lindley  disagrees  with  ae.  Perhaps  he  alght  correct  ae  '''' 

on  this  point.  You  assuae  that  you  can  always  aake  up  for 
incoapleteness  through  the  use  of  subjective  probabilities.  That  is  one 
point  of  view. 

The  point  of  view  that  is  taken  in  Deaps ter -Shafer  theory  is  that 
you  do  have  incoapleteness  in  your  knowledge  of  probabilities.  This  is 
what  Glenn  alluded  to  in  his  stateaent  that  you  deal  essentially  with  an 
incoaplete  aodel.  I  think  Deapster  used  the  word  "weakening.* 

When  you  see  this  sort  of  a  thing  the  incoapleteness  in  your 
knowledge  propagates  down  to  your  oonclusion,  as  a  result  of  which  the 
probability  has  becoae  interval-valued  in  soae  sense.  So  you  have  to 
speak  about  the  lower  and  upper  probabilities,  or  belief  and 
plausibility. 

Another  approach  is  based  on  the  use  of  the  aaxiaua  entropy 
principle.  Here  I  do  have  incoaplete  inforaatlon,  but  I  aa  going  to 
aake  up  for  it  by  aaking  certain  assuaptions,  which,  in  soae 
unsophisticated  way,  are  assuaptions  about  independence. 

Not  knowing  idiat  the  Joint  probability  is  you  assuae  that  it  is 
essentially  the  product  of  the  aargins,  or  roughly  like  that.  Of  course 
1  aa  oversiaplifying  things.  You  aake  up  for  it  in  that  fashion. 

Personally  1  feel  that  the  spirit  of  the  Deapster-Shafer  theory 
is  proper.  That  is,  you  should  not  really  aake  up  for  incoapleteness  by 
Baking  all  sorts  of  assuaptions,  regarding  either  independence  or 
assuaptions  about  probabilities  that  you  really  don't  know. 


78  - 


It  Is  St  this  point  thsn  that  Possibility  Thoory  ontors  into  tho 
pleturo.  Possibility  Thoory  is  ossontlslly  a  thoory  of  incoaplotonoss. 
L>ot  ao  try  to  oxplaln  what  la  Involvod  by  starting  with  a  vary  siaplo 
thing. 

Supposo  you  havo  a  varlablo  X.  You  say  X  is  oqual  to  A.  You 
assign  a  raluo  to  X.  A  woakor  statoaont  aight  bo  that  X  bolongs  to  soao 
sot  A.  Now,  whon  you  say  that  X  bolongs  to  A  ws  aro  aaking  a 
posslblliatie  atatoaont. 

No  aro  saying  that  tho  poasiblo  valuos  of  X  lio  in  this  sot.  But 
this  sort  of  possibility  is  tho  1-0  possibility.  It  is  tho  all  or~ 
nothing  possibility. 

Tho  possibility  that  is  uaod  in  possibility  thoory  is  a  aattor  of 
dogroo.  Gonorally  it  would  bo  aosociatod  with  atatoaonts  of  tho  fora  X 
is  A,  rathor  than  X  bolongs  to  A.  For  oxaaplo,  X  is  saall;  X  is  largo; 
Nary  is  tall;  and  so  forth.  Statoasnts  of  that  kind  aro  posslbilistic 
atatoaonts,  but  In  this  oaso  possibility  is  a  aattor  of  dogroo. 

No  can  talk  about  physical  possibility;  for  oxaaplo,  tho 
possibility  that  you  night  lift  50  pounds,  100  pounds,  150  pounds  and  so 
forth.  As  you  start  with  zoro  pounds  and  go  on  to  500  pounds,  thoro  is 
a  gradual  transition  froa  boing  ablo  to  do  it  quits  oasily  to  not  bolng 
ablo  to  do  it  at  all.  Notieo  that  whon  you  talk  about  possibility  in 
this  sonso,  thoro  is  nothing  probabillatie  about  it.  It  is  sinply  a 
aattor  of  oaso  of  attainaont.  But  you  aro  talking  at  this  point  in 
toras  of  physical  possibilitios. 

Consider  possibilitios  that  aro  assoeiatod  with  atatoaonts  or 
propositions  of  tho  fom  ”X  is  saall.'*  Tho  Intorprotation  that  you  put 
on  that  is  that  "anall*  is  soaothing  that  can  bo  strotohod.  In  talking 
about  tho  dogroo  of  strotch,  if  I  say  Mary  is  young,  and  Nary  in  fact  is 
35,  than  you  havo  to  strotch  tho  eoneopt  of  young  to  a  cor tain  dogroo  to 
accoaaodato  tho  valuo  35.  No  havo  strotch  but  it  is  not  physical 
strotch.  It  is  a  conceptual  stretch.  This  is  really  what  happens.  As 
far  as  I  aa  concornod  then,  tho  Doapstor-Shafor  thoory  is  one  in  which 
you  havo  soao  knowledge  about  probabilities  and  you  havo  sons  knowledge 
about  possibilitios.  It  is  a  aixod  thoory.  It  ia  certainly  a  stop  in 
tho  right  direction  because  it  ia  a  gonoralization  of  classical 
probability  thoory. 

Fuzzy  logic  or  fuzzy  sots  ooao  into  tho  picture  in  the  following 

way. 

In  using  fuzzy  logic,  you  do  not  havo  this  separation  botwoon 
logic  and  probability  as  wo  havo  in  tho  case  of  classical  logic.  They 
aro  under  tho  sasM  roof.  Haro  is  a  fuzzy  logic. 

First,  you  havo  tho  roprosontational  ocaipononta.  Tho 
roprosontatlonal  eonponont  has  to  do  with  taking  aonothing  that  ia 
oxprosaod  in  a  natural  language  and  translating  it  into  a  noro  prooiso 
language. 


It  Is  the  sort  of  thing  thst  you  Oo  idisn  you  uss  prsdiosts 
eslculus;  you  tsks  soasthing  and  sxprsss  it  In  ths  fora  of  prsdiosts 
esleulus. 

Thsn  you  slso  hsvs  an  infsrsntial  ooaponsnt.  Ths  infsrsntial 
ooaponsnt  eoasa  into  ths  pieturs  ones  you  havs  rsprsssntsd  knowlsdgs, 
sxprssssd  in  a  natural  languags,  in  that  aors  aathsaatieal,  aw*s  prsciss 
fora. 


How  oan  you  infsr  osrtain  propositions  froa  othsr  propositions? 
Gsnsrally  that  rsquirss  ths  solution  of  a  nonlinsar  prograa.  Hhat 
happsns  is  that  in  classical  logic  ws  uss  tools  such  as  resolution, 
things  of  this  kind*‘~aodu8  ponsns. 

Froa  ths  point  of  view  of  fuzzy  logic,  thsss  classical  rules  are 
asrsly  degenerate  foras  of  nonlinsar  prograsaing.  If  you  take  this 
point  of  view,  you  begin  to  see  sore  clearly  why  things  work  the  way 
they  do. 

Fuzzy  logic  subsuaes  probability  theory  through  the  use  of 
nuaarieal,  or  aore  generally,  fuzzy  quantifiers.  In  the  ease  of  fuzzy 
logic,  tor  the  aost  part  you  deal  not  with  probabilities  but  with 
quantifiers,  like  several,  aany,  aost,  few,  and  so  forth  which  in  seas 
sense  brings  probability  theory  to  the  pre>Kolaogorov  era. 

In  other  words,  this  is  the  way  probability  theory  was  looked  at 
aany  years  ago,  except  that  you  incorporate  it  into  this  quantifier 
structure.  Furthemore,  you  allow  these  quantifiers  to  be  fuzzy. 

The  concept  of  a  quantifier  is  closer  to  huaan  intuition  than  the 
ooncept  of  probability.  You  really  don't  have  to  use  probability.  You 
can  foraulate,  for  exaaple,  I>enpster*Shafer  theories  and  other  theories 
without  ever  aentioning  probability  theory.  If  I  have  a  chance,  I  will 
show  this  later. 

In  the  ease  of  expert  systeas,  auoh  knowledge  is  the  knowledge  of 
"usual  values."  Hhat  one  can  do'^and  this  is  what  I  an  trying  to  do  at 
this  point — is  develop  a  theory  perhaps  called  the  theory  of  usual ity. 


The  ooncept  of  usuality,  the  ooncept  of  usual  value,  differs  froa 
the  concept  of  expected  value.  I  think  it  is  really  aore  relevant  to 
decision-naking.  Let  ae  give  a  sinple  exaaple  of  what  I  have  in  nind. 

An  expected  value  is  siaply  an  average  value.  If  I  tell  you  that 
the  average  value  in  soae  location  is  70  degrees,  it  doesn't  aean  that 
auch;  it  could  be  extreaely  hot  in  the  suaaer  and  extrea  )ly  cold  in  the 
winter. 


But  if  1  aay  that  the  usual  value  is  70  degrees  it  aeans  auch 
aore  because  the  usual  value  is  really  representative  of  what  the  value 
is,  whereas  an  expected  value  is  not. 


.%  .Nr  J 


-  80  - 


So  tiM  quootioa  thon  ia  how  doas  tha  oonoapt  af  usualitjr  tia  in 
with  axpart  ayataas*  and  how  ean  ona  daiwlop  what  night  ba  eallad  a 
ealottlua  of  uaualitjr. 

Tha  oonoapt  of  uauality  ia  eaiiaa*ralatad  to  anothar  oonoapt*  that 
of  diapoaltiwi.  A  diapoaition  la  a  ^opoaition  whioh  ia*  for  tha  noat 
part  (but  not  naoaaaarily  alwaya)  trua. 

Tha  knowla^a  and  tha  eoanon  aanaa  knowladga  that  wa  hava 
oonalata  noatly  of  thaaa  diapoaltlona*  Furthamora*  tha  oonoapt  of  a 
diapoaition  aanifaata  Itaalf  in  nany  othar  waya.  Hara  ara  aona 
axanplaa. 

Aa  a  propoaitiont  allMaas  ia  attraetlva. 

Aa  a  valuatloni  it  takaa  about  fiva  nlnutaa  to  raaoh  tha 
atatlon. 

Aa  a  oonnand:  avoid  ovaraxartion. 

Aa  a  ranking:  Swadaa  ara  tallar  than  Oraaka. 

Aa  aiailarlty:  Spaniarda  raaaabla  Itallana. 

Aa  eauaality:  ovaraatlng  eauaaa  obaaity. 

Aa  typicality:  a  typical  Swada  la  tall  and  blond. 

Aa  uauality:  uaually  pork  la  auoh  ohaapar  than  vaal. 

Hhat  I  aa  trying  to  aay  ia  that  in  our  praocoupation  with 
probabilitiaa  wa  hava  tandad  to  loaa  aight  of  tha  fact  that  nueh  of  tha 
Infornation  an  irtiich  daclaiona  ara  baaad  doaa  not  raally  fit  olaaaioal 
nodal  probability  thaory. 

Tachnlcally*  diapoaition  ia  a  propoaition  which  ia  preponderant 
but  not  nacaaaarily  alwaya  trua.  A  diapoaition  ia  a  propoaition  with 
inplicit  fuzzy  qualifiara. 

For  axanpla:  birda  ean  fly:  noat  birda  can  fly.  Young  nan  lika 
young  wonan;  noat  young  nan  lika  noatly  young  wonan.  So  what  happana  ia 
that  whan  wa  aaaart  aonathing  that  ia  a  diapoaition.  thara  ara  inplicit 
fuzzy  quantifiara. 

You  ean  interpret  thia  quantifier  in  tarna  of  probabilitiaa*  but 
I  prefer  not  to  do  ao  and  atay  on  the  level  of  thaaa  quantifiara.  What 
happana  than  la  that  if  you  taka  thaaa  diapoaltlona  and  apply  nodua 
ponana  or  nodua  tollana*  than  you  gat  oartain  daeiaion  principlaa. 

For  axanpla*  if  A  inpliaa  B*  than  to  aehiava  B*  do  A.  Sllnnaaa 
ia  attraetlva.  To  ba  attraetlva*  ba  alia. 


Motle*  that  thaaa  things  don't  guarantaa  that  If  you  aohlova 
sllanosa  you  will  also  aehlava  attraetivonaas.  This  la  aiaply  a 
tondoney*  a  disposition.  Or  you  say  hava  a  nagativa  daeislon  prlaeipla. 
If  A  iapllas  B.  than  to  avoid  B.  avoid  A.  Ovaraating  oausas  obasityt  to 
avoid  obasity.  avoid  ovaraating.  Nhat  you  hava  to  do  is  aaka  knowladga 
of  that  kind  aora  pracisa. 

Lat's  start  with  soaathing  lika  "alimass  is  attraetiva.”  That 
can  ba  intarprotad  in  a  nuabar  of  ways.  For  axaaplo,  aost  of  tha  alia 
ara  attraetiva.  Most  of  tha  attraotiva  ara  alia.  Thara  ara  aaay  aora 
attraetiva  individuals  aaong  tha  alia  Utan  in  tha  gsnaral  population, 
and  so  forth.  Thara  ara  at  laast  nlna 'dlffarant  ways  in  whi^  you  can 

intarprat  this.  Any  ona  of  thasa  nina  or  aora  diffarant  ways  ara 

adalsslbla.  In  othar  words,  if  you  want  to  say  that  I  intarprat  it  in 
tha  Sanaa  of  No.  5,  that  is  parfaetly  all  right.  Tha  naxt  quastion  is 

how  would  this  stataaant  ba  intarpratad  or  undarstood  in  usual 

eonvarsation.  Will  it  ba  sansa  No.  1,  or  aansa  No.  2,  or  whatavar? 

Thara  is  thaissua. 

Thasa  fuzzy  quantifiars  that  antar  tha  pietura  ara  gaaarally 
saeond'ordar  pradicatas  which  eharactarisa  the  absolute  or  ralativa 
count  of  alasanta  in  ona  or  aora  fuzzy  oats.  In  tha  case  of  fuzzy  sets, 
you  hava  a  class  that  doesn't  hava  sharply  defined  boundaries t 
tharafora,  it  does  not  sake  sansa  to  ask  how  sany  alaaants  ara  in  that 
sat. 

Navarthalass,  you  can  taka  tha  classical  dafinitimi  of 
cardinality  and  extend  it  to  fuzzy  sets.  Tha  sinplast  axtansion  and  tha 
ona  that  is  prasantad  on  tha  naxt  transparency  involves  slaply  adding  up 
tha  grades  of  aaabarship  of  tha  alanants  in  tha  sat. 

This  is  called  the  slgaa  count.  If  you  hava  a  fuzzy  sat*”'that  is 
raprasantad  by  those  dotted  lines— and  if  you  hava  a  point  U  and  it  has 
tha  grades  of  aaabarship  U  in  tha  sat,  you  siaply  add  up  the  grades  of 
aaabarship. 

This  is  a  little  bit  lika  full  tiaa  equivalent.  That  is  tha  way 
university  adainistrstors  add  tha  nuabar  of  faculty  and  students  and  so 
forth.  You  can  fora  tha  ralativa  sigaa  count,  which  is  tha  sigaa  count 
of  tha  intersection  divided  by  tha  sigaa  count  of  A. 

If  you  say  that  QA's  ara  B's,  in  affect  you  ara  saying  that 
relative  sigaa  count  of  B  in  A  is  Q.  Notice  that  I  aa  aaying  is  Q  not 
equals  Q  because  it  is  undarstood  that  Q  is  a  fuzzy  nuabar.  That  is  why 
I  aa  not  equating  tha  sigaa  count  of  B  in  A  to  Q.  With  this  definition 
of  a  quantifier  you  can  aanipulata  tha  fuzzy  nuabars.  In  othar  words 
whan  you  say  "aost,"  you  ara  characterizing  in  a  fuzzy  sort  of  way  tha 
proportion  of  alaaants  of  ona  kind  in  alaaants  of  another  kind. 

"Host,”  for  axaapla,  than  would  ba  a  fuzzy  nuabar  like  this. 

This  fuzzy  nuabar  is  a  possibility  distribution.  That  is,  if  you  say 
that  soaathing  is  "aost,”  than  this  would  present  tha  possibility  that 
that  proportion  has  a  specific  nuaarioal  value. 


For  Maapl*.  if  .6  is  soMulMrs  h«r«.  you  read  this  vslus  hsrs, 
sad  that  will  giva  you  tha  possibility  that  it  has  that  valua.  Ones  you 
hava  definad  thasa  as  fussy  nuobars*  you  ean  aanipulata  tha*  using  fussy 
arithaatie.  For  axaapla,  you  ean  squara  than.  Xou  ean  add  thaa.  You 
ean  divide  thaa.  You  ean  do  various  things  with  these  fussy  nuabars. 

You  can  also  rapraaant  thaa  as  ultra*fussy  sets  ao  that  in  this 
partieular  ease  tha  possibility  distribution  in  itself  is  also  fussy. 

It  ia  at  this  point  that  tha  eoneapt  of  uauality  eoaas  into  tha 
piotura.  Uaually  it  is  intarpratad  as  a  quantifiar»  in  tha  following 
Sanaa. 

You  say  that  "usually*  X  is  F.  What  is  ia^liad  is  first  of  all* 
there  is  a  oonditioning  variable  2.  You  say  that  if  2  is  not  seas 
axeaptional  eases  er  you  aaka  a  positive  assertion:  it  belongs  to  a 
eartain  sat.  whieh  ia  tha  sat  of  nomsl  values,  than,  nest  X's  are  F*s. 
That  is  tha  conditioned  version.  Tha  unconditioned  version  is  siaply 
usually  X  is  F.  whieh  is  aost  X's  are  F's.  Hare  is  an  axaapla:  it 
takas  a  little  ever  an  hour  to  drive  froa  Berkeley  to  Stanford. 

Mow  notice  that  as  it  stands,  tha  word  "usually”  does  not  appear 
in  there.  It  is  iaplicit.  So  D  stands  for  disposition  and  R  hare 
stands  for  restoration.  You  are  restoring  disposition  if  you  aaka  the 
quantifier  explicit. 

So  you  say  what  it  really  aaans — hare  it  is  unoondltionad*>*ls 
that  usually  that  duration  is  little  over  one  hour.  "Little  over  one 
hour*  is  a  possibility  distribution.  You  are  giving  tha  possible  values 
of  that  variable  but  this  is  not  a  (robability  distribution.  Tha 
probabilities  indirectly  cone  in  tha  use  of  tha  word  "usually." 

Now  you  ean  condition  that.  You  can  say  that  if  departure  is  not 
rush  hour,  than  usually  duration  is  a  little  over  one  hour.  And  than 
you  can  define  aore  precisely  what  you  aean  by  using  what  is  oalled 
test-score  senantics. 

Test-score  senantics  tells  you  the  following.  It  says  I  will  be 
able  to  tell  you  the  degree  of  agreeaent  of  that  proposition  with  what 
is  in  the  data  base  if  you  tell  ae  the  entries  in  the  data  base. 

The  aeaning  itself  is  the  procedure.  It  is  not  the  values  in  the 
data  base.  That  is  why  this  data  base  is  called  an  explanatory  data 
base.  It  is  soaething  that  you  construct  for  purposes  of  explanation. 

In  effect  you  are  saying  that  if  you  gave  ae  a  record  in  which 
you  have  trip  one.  trip  two.  trip  three  and  so  forth,  and  these  things 
here— point  8.  point  3— this  is  the  degree  to  whieh  the  duration  for 
that  trip  agrees  with  little  over  one  hour,  in  this  case  you  will  agree 
to  degree  .8.  .3.  and  this  is  the  degree  to  whieh  the  tine  of  departure 
agreed  with  the  constraint  "not  rush  hour." 


-  83  - 


Xf  you  flvo  doto  liko  t)iot»  thon  by  going  through  this 
ooaputfttlon  whioh  also  involvos  dofinition  of  *aoot”  (I  trill  not  go  into 
tho  dotoilo)  you  will  bo  oblo  to  oonputo  tho  dogroo  of  ogroonont. 

It  is  that  oonputation  that  dofinoa  tho  naaning  of  tho 
proposition  whi^  ia  oxproaaod  in  natural  languago. 

DR.  GROSS;  hhat  dooa  tho  .8  and  .3  aotually  naan  whan  you  aay 

"tho  agroonont?” 

DR.  ZADEH;  Thia  ia  a  quoatiMi  which  ia  aonothing  that  noat 
pooplo  raiao.  In  othor  worda*  how  do  you  got  thoao  nunbora?  Tho 
aaaunption  horo  ia  that  thoao  nunbora  aro  your  porooption  of  tho 
agroonont.  It  ia  tho  aort  of  a  thing  that  hunana  aro  good  at  but  wo  do 
not  undoratand  too  wall  how  wo  do  it.  Now  froquontly  you  aro  aakod  to 
fill  out  a  quoationnairo*  or  writo  a  lottor  of  roeonnondation,  or 
whatovor.  In  filling  out  thia  quoatlonnairo  thoy  havo  a  eortain 
aealo*'on  a  aoalo  fron  xoro  to  ton  indicating  dogroo  to  which  thia 
atudont  ia  outstanding  or  whatever,  indicating  aonothing. 

Pooplo  don't  have  too  nuch  difficulty  in  doing  that  without 
really  understanding  how  wo  do  it.  Olynpio  judges  do  that.  This  is  a 
basic  issue.  1  an  not  trying  to  nininiao  ita  inportanco  but  for  tho 
nonont  I  want  to  put  it  aside. 

Lot's  assuno  that  in  one  way  or  another  if  aonobody  asks  you  tho 
question  "it  took  an  hour  and  a  half  to  got  fron  Borkoloy  to  Stanford; 
to  what  dogroo  does  this  1.5  hour  agree  with  your  perception  of  about 
one  hour'?" 

So  you  will  aay  .2.  .3.  That  is  how  those  nunbora  aro  obtained. 

Wiat  happens  is  this;  The  oontontion  horo  is  that  what  wo  call 
tho  knowledge  base  consists  really  of  tho  knowledge  of  usual  values. 

And  notice  one  thing,  that  if  wo  did  not  know  what  tho  usual  values  are, 
you  wouldn't  bo  able  to  do  a  thing. 

Now  tho  reason  wo  can  function  is  because  you  know  that  it  takes 
about  one  ninuto  to  got  fron  horo  to  tho  elevator;  it  takes  about  five 
ninutos  to  got  fron  horo  to  aonothing.  It  takes  about  throe  dollara  to 
have  lunch  in  this  cafeteria.  You  have  this  tronondous  store  of 
infomation  about  tho  usual  values,  not  just  of  various  paranotors,  but 
also  usual  values  of  relations,  for  oxanplo,  it  nay  bo  that  aonothing 
is  nuch  larger  than  sonothlng  else,  wd  so  on. 

When  you  aay  tho  usual  value  of  X  is  F,  what  you  naan  by  that  is 
that  usually  X  is  F.  When  wo  havo  usual  values  of  a  pair  (X,  Y),  that 
is  a  relation  really.  That  noans  usually  X,Y  is  R.  For  oxanplo, 
usually  X  is  nuch  larger  than  Y.  Usually  X  ia  snail.  Usually  nost  X's 
aro  snail  and  so  forth.  Now  tho  usual  value  of  X,  as  I  havo  indicated 
already,  is  not  tho  expected  value  of  X. 


How  what  Mttora  In  dooiaion  analyaia  la  ilia  uaual  rathar  than 
aspaotad  valua  of  X. 

M.  SHAPER t  Doas  it  vary  of tan  aattar  what  tha  uaual  valua  of 
tha  dlatanoa  froo  tha  uaual  valua  la?  Do  you  want  vary  oftan  to  taka 
into  aeooimt  how  far  you  uaually  ara  froa  tha  uaual  valua? 

OR.  ZADEHt  Yaa.  1  Will  ooaa  to  that  in  a  ooaant.  Lat'a  look  at 

that  a  llttla  oora  earafully.  Parhaps  that  oay  raapond  at  laast  in  part, 

to  your  quastlon. 

Suppoaa  wa  hava  what  ia  oallad  dlaposition  avaluationt  uaually  X 

ia  P.  Uhat  do  wa  aaan  by  that  raally?  How  firat  of  all,  auppoaa  that 

thia  P  ia  about  alpha.  So  hara  at  thia  point  you  hava  a  poaalbility 
diatribution. 

How  in  apaelal  eaaaa  it  would  ba  an  intarval,  for  axa^>la  batwaan 
aooathing  and  aoaathing.  But  thia  ia  now  a  poaalbility  diatribution. 

Ia  aooathing  aoall,  larga,  young,  aoaathing  lika  that. 

How  uaually  ia  alao  a  poaalbility  diatribution.  Hara  you  hava 
than  a  alxtura  of  two  poaalbility  dlatributiona,  ona  dafining  uaually 
and  tha  othar  dafining  what  tha  valua  of  tha  variabla  ia.  Hhan  you  aay 
uaually  X  ia  P,  you  hava  to  intarprat  that.  Thia  ia  whara  tha 
rapraaantational  eonponant  of  fuzzy  logic  eonaa  in  tha  pietura. 

It  aaya  that  you  ahould  intarprat  it  in  tha  following  faahion. 

If  you  know  tha  probability  diatribution  on  X,  lat*a  aaauaa  wa  hava  tha 
aiopla  probability  danaity,  than  taka  tha  aaobarahip  function  that  ia 
aaaooiatad  with  tha  diatribution  and  ealculata  thia  intagral  which  la 
tha  axpaotad  valua  baaieally  in  probabiliatie  taraa  of  thia 
eharactariatic  function. 

You  aubatituta  that  into  tha  dafinition  of  uaually  and  you  ooaa 
up  with  a  nuabar.  That  nuabar  ia  tha  poaalbility  of  tha  probability 
danaity.  It  ia  a  poaalbility  of  tha  probability. 

In  othar  worda,  it  aaya  that  whan  you  tall  aa  that  uaually  X  ia  F 
you  ara  giving  aa  inforaation  about  tha  paaaibillty  diatribution  of  tha 
probability  diatribution.  That  ia  what  you  ara  raally  tailing  aa. 

Mow  thia  ia  alaply  probability.  Than  thara  ia  tha  iaaua  of 
inforaativanaaa.  How  inforaativa  la  a  atataaant  Ilka  that?  Thara  ara 
two  thlnga  involvadt  ona  la  tha  apaoifloity  of  P.  Spacifioity  la  a 
oonoapt  that  Ron  Yagar  haa  wrlttan  on  axtanaivaly.  It  ia  aaaantially 
how  narrow  that  thing  ia,  how  raatrietiva  it  ia. 

Obvioualy  you  ara  not  giving  too  ouch  inforoation  if  thia  thing 
la  vary  broad.  Hor  ara  you  giving  any  inforaation  if  "uaually”  la  vary 
broad.  So  tha  inforaativanaaa  of  that  piaoa  of  knowladga  than  ia  a 
function  of  tha  apaoifloity  of  thia  and  tha  apaeificity  of  that.  (And 
for  aiaplioity  you  can  fom  tha  product  if  you  want  to  daflna 
"Infomativanaaa* ) . 


So  this  is  ths  my  In  whleh  "ususlly”  mold  bs  intorprstod.  Sow 
how  esn  you  eoaputo  with  this  sort  of  s  thing.  Lot’s  tsks  s  vsry  sinpls 
sxsapls.  Supposs  thst  you  know  ths  ususl  vslus  of  X  snd  you  know  ths 
ususl  vslus  of  Y.  Whst  would  bs  ths  ususl  vslus  of  X  plus  Y,  ths  oost 
slsasntsry  qusstion  you  osn  rsiss.  In  ths  csss  of  sxpsotstions,  of 
eourss,  ws  know  it  is  A  plus  B.  How  if  you  ssy  for  sisplieity  (lot's 
sssuM  thst  thsss  srs  nuabsrs).  X  is  A  snd  Y  is  B,  thsn  X  plus  Y  is  A 
plus  B  in  fussy  srithMtie. 

Fussy  srithMtie  is  s  gsnsrslisstion  of  intsrvsl  srithMtie.  In 
ths  esss  of  inttrvsl  srithMtie.  you  srs  dssling  with  possibility 
distributions  thst  srs  sithsr  ssro  or  mic. 

Hhsn  you  ssy  thst  ths  nuabsr  is  in  this  intsrvsl.  you  srs  ssying 
thst  ths  possibility  thst  it  is  thsrs  is  ons  snd  ths  possibility  thst  it 
is  outslds  is  ssro.  So  this  sddition  of  fussy  nuobsrs  is  s 
gsnsrslisstion  of  ths  sddition  of  intsrvsl-vslusd  nusbsrs.  In  offset 
you  srs  ssying  thsn  thst  X  plus  Y  is  this.  Hhst  you  esn  show  is  thst 

ususlly  X  is  A  snd  ususlly  Y  is  B  ioplles  thst  ususlly  plus  X  plus  Y  is 

A  plus  B.  Plus  Msns  it  is  s  little  bit  nsrrowsr.  How  aueh  nsrrowsr 
jwu  esn't  toll  without  hsvlng  Mrs  Informtion  sbout  them  things. 

Ths  ststsMnt  bsooMS  s  little  bit  shsrpsr.  It  is  not  Just 
ususlly  X  plus  Y.  but  it  is  ususlly  plus  X  plus  Y  is  A  plus  B.  If  you 
mntsd  to  psrfors  s  Mrs  osrsful  snslysis  of  this,  oonsidsr  ths 
following  prsetiesl  problm. 

You  hsvs  dinner  st  s  rsstsursnt.  You  hsvs  ths  cost  of  ths 
sppstissr;  ths  oost  of  ths  sntrm;  the  cost  of  dessert.  You  hsvs  osks. 
So  ths  totsl  oost  is  expressed  by  this.  You  know  ths  ususl 

vsluss—ususlly  X  is  sbout  $3.  ususlly  Y  is  sbout  $10.  snd  so  on.  Hhst 

is  ths  ususl  vslus  for  ths  dinner?  If  you  tsks  the  ssm  problso  thst  I 
oonsidsrsd  previously.  Z  is  squsl  to  X  plus  Y.  instssd  of  ths  kind  of 
result  in  ths  previous  slide  (where  you  tsks  ths  sun  of  A  plus  B  snd  ssk 
whst  qusntifisr  esn  you  sssoeists  with  A  plus  B)  you  ssk  snothsr  kind  of 
qusstion. 

You  ssk  if  1  went  to  stiok  to  ususlly.  whst  is  thst  vslus  thst 
osn  sssoeists  (not  nsessssrily  A  plus  B)  with  ths  sun.  It  is  s 
different  kind  of  qusstion. 

I  will  not  go  through  this  snslysis  but  it  is  nors  eonplsx. 

Agsin  dssling  with  possibilities,  it  turns  out  thst  svsntuslly  you  hsvs 
to  solve  osrtsln  squstions  involving  thsss  possibilities  snd  thst  in 
turn  will  require  ths  solution  of  s  nonllnssr  progrsa.  In  gsnsrsl  ths 
nonlinssr  progrsM  thst  result  fron  fornulstlons  of  this  kind  srs 
oontlRuous  nonlinssr  progrsM.  In  other  words,  they  srs  progress  in 
whioh  you  don't  hsvs  sinply  vsotor-vslusd  vsrisblss.  but  rsthsr  ths 
vsrisblss  srs  probsbility  densities  snd  things  of  this  kind.  It  bsoonss 
expensive  to  mIvs  thsss  problem.  At  this  point  you  do  mt  hsvs  quiok 
snd  dirty  mys  of  mlving  nonlinssr  ^ogrsn  problem  of  thst  kind.  This 
is  s  qusstion  thst  mo  rsissd  by  Glenn  in  his  tslk. 


Is  thsrs  s  distlnotlvs  kl  flavor,  soMthlng  othsr  than  straight 
probability  analysis,  in  tha  easa  of  axpart  systaas?  1  think  that  this 
is  in  that  spirit.  It  is  not  in  tha  spirit  of  elassioal  probability 
thaory. 

It  is  a  Bixtura  of  logio  togathar  with  fuzzy  quantifiar 
probability.  This  is  a  dispositional  nodus  ponans  and  is  raslly  what  is 
naadad  to  ba  abla  to  propagata  this  knowladga  of  probabilitias  upward  on 
tha  traas  or  infaranea  natworks  that  Glann  aantionad. 

Supposa  you  know  that  Q  X  is  P.  Now  Q  is  sons  sort  of  a 
quantifiar.  It  oould  ba  "usually*  but  it  doasn't  hava  to  ba.  It  eould 
ba  "about  80  par  oant  of  tha  tiaa."  It  eould  ba  anything  you  wish. 

Q  is  P.  If  X  is  P,  than  Q  Y  Is  C.  Notiea  that  as  was  asntionad 
alraady  in  tha  easa  of  produetion  systaas,  usually  you  hava  rulas  of  tha 
"if  A,  than  B  kind."  If  A,  than  B.  And  you  also  hava  faets  about  A. 

Kara  that  faet  about  A  is  dispositional,  in  tha  sansa  that  thara 
is  a  fuzzy  value  assoeiatad  with  it  and  also  this  Q.  My  eontantion  is 
that  thia  is  really  tha  kind  of  inforaation  that  people  hava.  This  is 
tha  kind  of  inforaation  that  geologists  hava.  This  is  tha  kind  of 
inforaation  that  doetora  hava.  Purtharaora  thara  are  tha  rulas. 

Hare  tha  assuaption  la  that  you  allow  this  quantifiar,  whleh 
eould  ba  intarpratad  as  probability  if  you  wish,  on  tha  ri^t  side  of 
tha  rule.  It  turns  out  that  tha  oonelusion  that  you  ean  raaeh  hare  is 
that  Q^,  (Y  is  G)  is  tha  square  of  that  fuzzy  nuabar  that  is  assoeiatad 
with  Q.  Per  axaapla,  usually  X  is  P.  If  X  is  P,  than  usually  Y  is  G. 
Tharafora,  Usually^  (Y  is  G).  This  is  tha  sort  of  conolusion. 

Hera  is  tha  pletura.  Usually^  is  lass  spaeifie.  As  a  result  of 
tha  use  of  these  ehains  of  infaranea,  tha  results  baeoaa  fuzzier  and 
fuzzier  and  fuzzier. 

Basically  tha  conclusion  that  aaargas  is  that,  in  general,  these 
chains  of  infaranea  cannot  ba  long.  In  ay  soana  thay  can  ba  arbitrarily 
long  but  than  tha  validity  of  tha  conclusion  is  vary  aueh  in  question. 

Basically  you  can't  use  long  ehains  of  reasoning  if  your 
inforaation  is  iapraoisa.  Mara  is  another  rule.  In  fuzzy  sets  and 
possibility  thaory  thara  is  a  rule  that  is  called  tha  eoapositional  rule 
of  infaranea. 

Tha  eoapositional  rule  of  infaranea  is  shown  hare.  It  says  X  is 
P.  X  and  Y  are  G.  Tharafora,  Y  is  tha  eoaposition  of  these  two 
relations.  P  is  a  unary  relation  and  G  is  a  binary  relation.  Notiea 
that  thara  is  nothing  that  is  probabilistic  about  this  sort  of  thing. 
Nary  is  tail.  John  is  aueh  taller  than  Nary.  Tharafora,  John  is  tha 
eoaposition  of  tall  and  aueh  taller. 


How  this  is  setuslly  porforasd  Is  shown  owsr  hors.  You  hsvs  to 
work  with  tho  asaborship  function.  Thsn  you  tsks  ths  suprsaua  ovsr  this 
sort  of  a  thing  nnd  thsn  thst  is  ths  rssult. 

This  any  bs  viswsd  ss  ths  snslog  of  ths  elsssiosl  Bsyssisn  ruls; 
ths  probability  distribution  of  Y  is  ths  convolution  of  ths  probability 
distribution  of  X  and  ths  probability  distribution  of  Y  givsn  X. 

In  ths  oass  of  fussy  ssts*  instsad  of  dsaling  with  intagrals*  you 
usually  dsal  with  suprsaa  and  instsad  of  dsaling  with  products  you  deal 
with  ths  ain  opsrator. 

If  you  taks  ths  standard  foraula  in  probability  thsory — ths 
probability  of  Y  is  squal  to  ths  intsgral  or  suaaation  or  probability  of 

Y  givsn  X  tiass  ths  probability  of  X — this  is  its  snslog,  ths 
ooapoaition  ruls. 

Thsn  ths  qusstion  arisss,  what  happsns  if  you  qualify  thsss 
things  probabilistically,  or  here  in  tsms  of  usual  values.  That 
usually  X  is  F  and  usually  X  and  Y  ars  G.  You  qualify  thsss  things. 
Again,  this  is  liks  Glsnn  asntionsd  in  his  talk,  that  you  taks  thsss 
rules  and  associate  soas  probabilities  for  certain  factors  (or  soasthing 
liks  that)  with  thsss  rules. 

This  is  in  that  spirit.  Ths  question  is  can  you  say  that  Usually^ 

Y  is  F  coapossd  with  G.  Ths  answer  at  this  point  iS’***!  an  not  sure.  It 
looks  reasonable  but  in  order  to  Justify  this  one  has  to  go  through  a 
reasonably  eoaplicatsd  analyses.  1  have  soas  transparsnoiss  hers  but  I 
will  not  show  thsa  and  I  have  to  solve  soas  problsas  again  in  nonlinear 
prograaaing. 

It  is  possible  that  that  rssult  aay  bs  good{  again  1  aa  not  sure. 
Hers  is  ths  nonlinear  prograaaing  problsa  to  which  this  reduces. 

This  hers  is  greater  than  or  squal  to  ths  suprsaua  of  this  rather 
assay  looking  expression  over  hers  and  ws  have  to  find  ths  saallsst 
asabsrship  function  which  satisfies  that. 

Nhat  happsns  is  that  in  principle  you  can  taks  problsas  liks  that 
and  reduce  thsa  to  ths  solution  of  thsss  nonlinear  prograas.  In 
practice,  ths  stuabling  block  at  this  point  would  bs  how  to  solve  those 
nonlinear  prograas  in  an  approxiaats  fashion,  bscauss  what  is  going  to 
happen  is  this.  If  you  want  to  solve  Uisss  nonlinear  prograas  using 
standard  software  that  is  available  for  solution  of  such  problsas,  you 
would  bs  wasting  ths  capability  of  a  ooaputsr  to  solve  thsss  problsas 
precisely  bscauss  you  ars  not  really  interested  in  ths  precise  answers 
to  thsss  questions.  You  ars  interested  in  soasthing  that  is  about  ths 
saas  «*dsr  of  precision  as  ths  original  data,  whi^  is  not  that  high. 

This  suggests  a  view  of  decision  analysis  thst  is  agsin  different 
froa  ths  classical  one,  which  aight  perhaps  bs  called  dispositional 
decision  analysis. 


In  thin  dinposltlonnl  dnolslon  nnnlynin  you  nnnuan  that  thn 
values  of  various  aaranatara  ara  dispositional  valuations.  For  axanpla. 
alpte  la  a  paraaatar.  You  say  it  la  aoMthlng  whleh  la  a  fuxzy  valua. 
but  thara  la  an  additional  quantlfiar  (laplleit  quantlfiar)  uhioh  la 
"usually.* 

"Uaually*  a  cup  of  eoffoa  costa  about  50*oanta.  Hhat  happens  la 
this:  you  oan  taka  soaa  standard  problana  In  decision  analysis  (this  la 
one  involving  Markovian  decision  analysis)  where  you  have  a  systan  with 
the  nuabar  of  states.  You  have  transitions  froa  states;  each  denoted 
condition  probability  for  each  transition;  and  you  know  also  the  cost 
associated  with  the  transition.  You  want  to  find  a  policy,  that  la  what 
input  to  apply  when  you  are  in  a  specified  state.  In  such  a  way  as  to 
alniaUe  the  expected  cost. 

This  problea  haa  been  considered  extensively  In  the  literature. 

By  using  dynaalc  prograaalng  It  can  be  reduced  to  the  solution  of  a 
functional  equation.  The  dispositional  aspect  cones  Into  the  picture  in 
the  following  fashion. 

These  transitions  would  be  of  the  forn  whl<A  Is  shown  here  If  the 
Input  is  alpha  K  and  you  are  In  state  QY;  Uien  the  next  state  is 
"usually  soaethlng."  That  is  the  way  It  would  be  specified.  How  can 
you  find  an  optlanal  strategy  for  a  situation  In  whleh  that  la  the  kind 
of  Infornatlon  that  Is  available.  It  turiu  out  that  one  can  take  this 
equation  (whleh  Is  the  one  that  one  finds  In  standard  Markovian 
approches)  and  this  oan  be  solved  by  using  fixed  point  nethods.  That  Is 
one  way  of  solving  It,  regarding  that  as  an  equation  of  the  fom  that  X 
la  equal  to  sone  function  of  X,  and  using  fixed  point  Iteration.  This 
fixed  point  will  now  be  a  fuzzy  set.  It  will  not  be  a  slnple  fora.  It 
will  be  a  cloudy  thing  fuzzy  set,  but  in  the  literature  on  fuzzy  sets 
there  are  several  papers  which  deal  with  the  extension  of  various  fixed 
point  theoreas  to  fuzzy  sets,  and  those  extensions  nay  be  relevant  to 
the  solution  of  this  problea. 

There  Is  soaethlng  that  has  been  generating  a  lot  of  interest  In 
recent  years  In  Al,  and  this  is  the  subject  of  non-nonotonio  reasoning. 
It  relates  to  the  Issue  of  dlspositlonallty.  As  far  as  1  can  see,  what 
people  call  non-aonotonlc  reasoning  Is  slnply  probabilistic  reasoning. 

It  Is  not  non-nonotonlc.  In  other  words,  I  question  the  use  of  the  tera 
non-nonotonlc  to  describe  that. 

Here  Is  a  slaple  exaaple  of  idiat  people  call  non-nonotonlc 
reasoning:  Slinky  is  a  bird.  Wiat  Is  in  parentheses  here  Is  what  1. 
laplleit.  Birds  can  fly.  Therefore,  Slinky  can  fly. 

Again,  here  In  parentheses  you  have  soae  fora  of  qualification, 
like  It  Is  very  likely  that  Slinky  oan  fly.  Disregarding  that,  then  you 
say  Slinky  oan  fly.  Then  you  have  additional  infornatlon. 


Slinky  is  an  ostrich.  Ostriches  are  birds.  Ostriches  cannot 
fly.  Therefore,  Slinky  cannot  fly.  Now  people  say  that  this  is 
non-monotonic  reasoning  because  here  you  are  of  the  conclusion  that 
Slinky  can  fly  and  here  you  are  of  the  conclusion  that  Slinky  cannot 
ny. 

In  classical  logic  that  cannot  happen.  In  classical  logic,  as 
you  add  propositions  to  your  premises,  the  truth  value  of  a  conclusion 
can  never  change.  If  it  has  been  established  to  be  true  it  will 
continue  to  be  true. 

The  point  I  am  trying  to  make  is  that  if  you  look  at  this  as 
probabilistic  reasoning  there  is  nothing  that  is  nonmonotonic.  It  is 
simply  a  matter  of  revision  of  belief  in  the  very  classical  sense.  That 
is,  here  you  have  two  pieces  of  information;  that  Slinky  is  a  bird  and 
Slinky  is  an  ostrich. 

But  ostriches  are  subsets  of  birds.  So  you  have  the  conditional 
probability  of  A,  flying  given  that  the  bird  is  an  ostrich,  but  that  is 
the  same  as  the  conditional  probability  of  A  given  B  because  B  is  a 
subset  of  C  by  the  simple  rules  of  conditioning. 

Therefore,  the  probability  that  Slinky  can  fly  is  zero;  Slinky 
cannot  fly,  because  you  know  it  is  an  ostrich.  So  the  reason  why  people 
think  it  is  non-monotonic  is  because  those  implicit  probabilities  are 
disregarded. 

Once  you  have  made  them  explicit,  then  the  non-roonotonicity 
disappears.  One  important  issue  that  arises  is  how  can  you  combine 
evidence  using  these  quantifiers? 

The  basic  idea  is  that  you  use  fuzzy  syllogisms.  What  is  a  fuzzy 
syllogism?  It  is  something  of  the  following  form: 

Qj^A's  are  B's;  Q2C's  are  D's;?  Q^E's  are  F's. 

Now  here  are  some  examples.  Most  young  women  are  slim.  Many  slim  women 
are  attractive.  What  fraction  of  young  women  are  attractive?  Now  A,  B, 
C,  D,  Q2  are  all  assumed  to  be  fuzzy. 

Now  depending  on  how  A  and  B  and  C  and  D  are  constrained,  you  get 
different  syllogisms.  For  example,  chaining  involves  a  situation  in 
which  this  B  is  the  same  as  this  C,  this  E  is  the  same  as  A,  and  this  F 
is  the  same  as  D. 

Then  you  have  consequent  conjunction,  antecedent  conjunction, 
dissection  product  and  so  forth.  You  have  a  number  of  syllogisms 
depending  on  how  these  things  are  constrained. 

The  Dempster-Shafer  rule  of  combination  relates  to  just  one  of 
these  syllogisms.  In  other  words,  you  need  not  have  just  one  rule  of 
combination  but  many  in  order  to  be  able  to  chain,  to  deal  with 
disjunctions  and  various  other  things. 


H«r«  iB  Bn  BXBnplB  of  n  Byllotlsa  in  fuzzy  logic.  A'b  arc 
B'b.  Q2(A'b  and  B'a)  arc  C'a.  Tharafora.  (Q^  tlnaa  Q2)A'a  ara  B'a  and 
Ca.  froa  which  you  can  Infar  that  at  laaat  (Q^  tlaas  Q2)A*a  ara  C'a 
from  which  in  turn  you  can  infar  that  (Q^  tinaa  Q2)A'a  ara  C'a,  if 
and  Q  ara  nonotonio  quantiflara  or  aonotonio  nuabara. 

Thia  intaraaction  product  ayllogiaa  in  probability  theory  would 
corraapond  to  tha  baale  foraula  in  which  you  have  axpraaaad  tha 
probability  of  A  glvan  B  and  C  or  probability  of  B  given  C  in  taraa  of 
aoaa  of  tha  eonatituant  probabilitiaa,  ao  that  if  axpraaaad  in  tama  of 
quantiflara,  it  would  be  a  very  alaaantary  rule  in  probability  theory. 

Tha  diffaranca  ia  thia.  All  of  thaaa  thlnga  that  appear  there 
ara  allowed  to  be  fuzzy,  wharaaa  in  probability  theory  they  ara  not 
allowed  to  be  fuzzy.  Evanta  ara  not  allowed  to  be  fuzzy. 

Thia  intaraaction  product  ayllogiaa  than  can  be  applied  hare  to 
aituationa  like  thia:  QiA'a  ara  B'at  Q2(A'a  and  B'a)  ara  C'a.  That  ia 
a  ayllogiaa.  Now  if  you  put  (alaoat  alDA'a  ara  B'a  and  all  B'a  are 
C'a,  than  froa  that  you  can  infer  that  all  A'a  and  B'a  ara  C'a. 

Proa  thaaa  two  now,  you  can  infar  that  (alaoat  all-tiaea-all)  A'a 
ara  B'a  and  C'a  and  it  ia  undaratood  that  thia  product  ia  a  product  in 
fuzzy  arithaatic  rather  than  a  product  in  ordinary  arithaatlc. 

And  (alaoat  all'tiaaa-all)  laeka  unity,  ao  "alaoat  all-tiaaa  all" 
ia  tha  aaaa  aa  (alaoat  alDA'a  ara  B'a  and  C'a  froa  which  you  can  infar 
that  at  laaat  (alaoat  alDA'a  are  B'a  and  froa  which  you  can  infar  that 
(alaoat  alDA'a  ara  C'a. 

Froa  (alaoat  alDA'a  ara  B'a  and  all  B'a  ara  C'a  you  can  infar 
alaoat  all  A'a  ara  C'a.  1  ahould  like  to  note  tha  claaaleal  ayllogiaa 
of  Ariatotla.  All  A'a  and  B'a,  all  B'a  ara  C'a,  tharafora  all  A'a  ara 
C'a.  That  ia  a  atandard  ayllogiaa. 

Thia  ia  a  variation  on  that  where  in  tha  firat  praaiaa  you  relax 
a  little  bit  ar.J  aaka  ia  "alaoat  all  A'a  ara  B'a."  Than  it  turna  out 
that  alaoat  all  A'a  ara  C'a.  However,  if  inataad  of  introducing  "alaoat 
all"  in  tha  firat  praaiaa — tha  aajor  praaiaa — you  introduce  it  in  tha 
ainor  praaiaa,  than  thia  whole  thing  will  collapaa.  In  other  worda,  it 
ia  brittle. 

Tha  difficulty  than  ia  that  if  at  aoaa  point  you  replace  (in 
tama  of  probabilitiaa)  probability  one  by  probability  one  alnua  apailon 
tha  whole  chain  of  raaaoning  collapaaa  coaplataly.  Wharaaa  in  other 
praaiaaa,  it  la  okay.  What  eoaplicataa  thia  and  haa  not  bean  eonaidarad 
in  tha  theory  of  expert  ayatana  ia  that  tha  place  for  introducing  thaaa 
probabilitiaa  ia  critical. 

It  ia  a  little  bit  like  what  happana  to  round-off  error  in  tha 
oouraa  of  parfoming  nuaarlcal  coaputatlon.  In  aoaa  plaeaa  it  ia  okay. 
Ih  other  plaeaa  it  can  have  diaaatroua  raauita.  Thia  la  tha  problan. 


People  essuae  thet  these  eoeputatlons  ere  robust,  but  in  fact 
they  Bsy  not  be  robust.  What  happens  Is,  for  exaeple:  This  Is  a 
oonsequent  eonjunction  syllofise. 

Q^A's  are  B'a.  Q2A's  are  C's.  Mhat  is  the  fraction  of  A's  and 
B's  and  C's?  It  turns  out  that  there  is  a  bound  on  Q.  Now  these  are 
operations  on  fuzzy  nuabera  and  in  particularly  if  and  Q2  are  aost, 
then  the  bound  on  Q  is  two>tiaes-aost  ainus  one  on  one  side  and  aost  on 
the  other  side. 

Uhat  happens  is  that  this  inooapleteness  of  inforaation 
translates  into  bounds  and  quantifiers.  This  is  one  of  the  serious 
weaknesses  of  these  traditional  approaches  to  dealing  with  uncertainties 
of  expert  syateas.  There  is  an  iaplicit  assuaption  of  coapositlonality. 
That  assuaption  is  aade  in  HYCIN.  It  is  aade  in  PROSPECTOR.  It  is  aade 
in  all  the  systeas  that  have  been  devised  so  far.  That  is,  you  assuae 
that  if  you  associate  a  certain  factor  with  a  certain  rule,  and  another 
certain  factor  with  another  rule,  and  these  two  rules  are  coabined,  then 
the  certainty  factor  associated  in  the  coabination  would  be  a  nuaber. 

This  sort  of  a  thing  suggests  that  it  will  not  be  a  nuaber  in 
general.  Even  if  you  start  with  two  nuabera  to  begin  with,  the  result 
will  be  an  Interval  valued  nuaber  if  you  have  no  fuzziness  in  the 
picture. 


If  you  do  have  fuzziness,  then  it  will  be  a  fuzzy  interval  valued 
nuaber.  So  this  coapoaltionality  then  is  lost.  And  because  it  is  lost, 
the  coaputstions  very  quickly  becoae  unlnforaative.  I  think  that  the 
long  chains  of  coaputstions  that  are  allowed  in  HYCIN  are  really  not 
Justified.  There  is  another  problea  that  1  want  to  aention  in 
oonnection  with  HYCIN.  There  is  one  serious  flaw  as  far  as  I  can  see 
and  that  is  the  certainty  factor  is  taken  to  be  the  difference  between 
the  aeasure  of  belief  and  the  aeasure  of  disbelief.  That  aeans  that  you 
have  certain  supportive  evidence  which  tends  to  lead  you  to  the 
conclusion  thst  the  hypothesis  is  true;  there  is  another  body  of 
evidence  which  tends  to  lead  you  to  the  conclusion  that  it  is  not  true. 

If  you  coapute  the  two  and  subtract  one  froa  the  other  the  net 
difference  is  taken  to  be  the  certainty  factor  for  the  conclusion.  That 
sort  of  a  thing  can  lead  to  the  following  highly  counter-intuitive 
situation: 

You  have  100  witnesses  testifying  and  99  of  thea  say  that  the 
defendant  is  guilty  and  one  of  then  says  that  the  defendant  is  not 
guilty.  The  one  negative  vote  ooapletely  nullifies  the  99  positive 
votes. 


In  other  words,  evidence  is  not  ouaulative  in  HYCIN.  This  is 
what  happens,  which  is  counter-intuitive.  So  the  situation  then  is  this 
ss  far  as  I  can  see.  To  suanarize  what  I  have  said  is  that  I  believe 
that  Deapster-Shafer  theory  is  a  very  useful  theory.  It  is  a  very 
interesting  theory  froa  the  theoretical  point  of  view  and  it  is  a  very 
useful  theory  froa  the  practiosl  point  of  view. 


-  92  - 


It  is  osrtslnly  a  step  in  the  right  direction.  1  think  that  by 
itself,  however,  it  is  not  sufficient;  that  is.  you  have  to  have  nsny 
other  rules,  of  eonbinstion,  for  inferenoe  and  so  forth,  in  order  to  be 
able  to  deal  with  the  probleas  that  one  encounters  with  inferential 
processes  in  expert  systees. 

1  think  probability  theory  by  itself  is  likewise  insufficient. 
That  is,  one  has  to  have  at  one's  disposal  probability  theory  and 
possibility  theory  and  use  the  two  of  then,  generally  in  ooabinatlon. 

In  that  way  you  arrive  at  answers  whose  precision  is  ccwsensurate  with 
the  iaprecision  of  the  inforaation  knowledge  base. 

You  do  not  have  the  kind  of  artificial  precision  that  you  get  out 
of  existing  expert  aysteas.  MYCIN,  PROSPECTOR  and  so  fm*th  give  you 
nuabers  which  are  aisleading  because  there  is  really  no  Justification 
for  t'..?t  high  degree  of  precision  if  you  use  any  kind  of  established 
theory,  be  it  probability  theory  or  soae  other  theory. 

All  of  these  theories  will  lead  you  to  the  conclusion  that  what 
you  can  assert  about  the  certainty  factor  of  the  final  answer  is  such 
less  specific  than  the  existing  expert  systeas  would  give. 

Thank  you  very  auch. 

DR.  DEGROOT:  Thank  you  very  auch. 

(Applause) 


MANIPUUTION  OF  FUZZY  QUANTIFIERS 

60%  sf  (kidcnlt  ort  sinQte 
60%  Ot  tinOlc  ikidinls  ok  imI* 

60%  K  60%  of  fMentt  ort  tMiglc  molt* 

most  tludcnU  ore  (ingle 

o  little  man  than  o  tiolf  of  (ingle  (kidenK  ore  mole 
(ino(t)6(D  fettle  more  than  o  holfl  of  (tudeiM  ora  mole 

(mo(i)6(o  feme  more  than  o  holfl*  obout  o  holf 
roundng  to  Me  neored  compraheneible  Knguislic  term. 


bYLLOQlifeMS 


p,  - -  major  premise 

p,  — -  minor  premise 

p  -  ■  conclusion 


/; 


'di(pO(ition 


'A 


propodtion 


roetoretion 

d,  -  —  p,(0,)*-fuzzlly  quantified  promiae 


roetoration 

d,  *  Pi(0»)^tuzzily  quantified  promiae 


(upproaaion 

d  —  p(Q)«— fuzzily  quantified  conclueior 


compoailionality  »-»  OstCQ,,0(]  indopondeni  of 

Pi  and  pt 

mod  atudonte  oro  undergraduatoe 
moat  undorgraduatoa  aro  young 


AZ  Count 


ZCounKC/Ane)  «  ZCounl(C/A)  ZCounl(C/6)e 


~F 

0, 


^  a. 


"F 

O. 


ZCeunt(A)  ZCounUe) 
ZCount(Ane)ZCounl(C) 


_  A  ZCount(e) 

AZCeunt(8)  - 


AZCountCe/A) 


ZCcuntCB/A) 

ZCounltiB/A) 


AZCount(C/AnB}sAZCount(C/A)AZCount(C/e) 

AZCounthC) 


•‘tcnAicnAi 


k.. 


INTERSECTION/PRODUCT  SYLLOGISM 


0,  A’s 

are 

B's 

Q.  (A 

and 

B)s 

are  C's 

(0,  • 

Qi) 

A’s 

are  (B  and  O’s 

^(0,  • 

Qz) 

As 

are  C's 

(Qi  • 

Ot) 

A’s 

are  C's 

VI. 

-  monotonic 

o. 


m 


i*  ’ .  t .  • 


fjkt  A. 

I 


most*  students  art  young 


ION  tYLLOQI 


QA's  are  B’s 
^(ieO)A'8  are  not  B's 


ZCount  (tB/A)  i  1 -2Count(B/A) 
Q,  i  1  e  O,  -J's 
B  if  A  is  nonluzzy 


lew  tall  mef>  are  not  fat 
(1  e  few)  tall  men  are  fat 


ANTECEDENT  CONJUNCTION  SYLLOGISM 

OiA's  er*  C’t 
QiB'»  ere  C't 

TO(A  and  B)'s  arc  C'a 

ECount(C/A)  is  Q, 

ZCount(C/B)  is  Q| 

ZCounl(C/AnB)  Is  70 

NO  ASSUMPTIONS  QbCO.I] 

MYCIN  O  B  0,  4  Ot  -  0,0* 
(Assumptions  >  7J 

PROSPECTOR:  Assumption  m 


£Count(AnB/C)  >ZCount(A/C)  ZCouni(B/C) 


CONSEQUENT  DISJUNCTION 
SYLLOGISM 

OiA's  are  B's 
OiA's  are  C's 
OA's  are  (B  or  C)'s 


Oi  e  Qa  £  O  s  ie(Qi  e  Q,) 


CONSEQUENT  CONJUNCTION 

00(0,  4  0,  -  1)  $  0  $  0,0  O, 


NEGATION 


P 

not  p 


X  is  F 


— ►  not  (X  is  F) 
X  is  not  F 


OA’s  are  B’s  ICount  (B/A)  is  O 


DUALil  r 


NEGATION 


CONSEQUENT 

CONJUNCTION  SYLLOGISM 


OA’b  tf  B'b _ 

(10  0)  A't  ar*  (not  B)'» 


OA's  ar*  B'a 

2(1  0  0)A's  ara  (not  B)'a 
(1  0  0)A‘t  ara  (not  B)’a 


O  is  monotona  dacraasing 


OiA'a  ara  B’a 
O.A'a  ara  C'a 
?OA'a  ara  (  B  and  C'a  ) 

00 (  0,00.01  )^O^O0O, 

0,BO|Bmosl 

0  O  (  2  most  01  )^Q^most 


CHAINING  SYLLOGISMS 

contalnmant  (BCA) 

0|A's  ara  B's 
OaB's  ara  C's 

(Of*  Os)  A's  ara  C's 

most  A's  ara  B's 
most  B's  ara  C's 
most*  A's  ara  C's 

most  studants  ara  undsrgraduatss 
most  undargraduatas  ara  young 
most*  studants  ara  young 


CHAINING  SYLLOGISMS 

MAJOR  PREMISE  REVERSIBILITY 

0,A’s  ara  B’s  •—  major  pramisa  ravarsible 
O.B’s  ara  C's 

2(0v(0,  0  0,@i))A's  ara  C'a 

most  Amarican  cars  ara  big 
most  big  cars  ara  gas  guzzisrs 
(2  most  0  DAmarican  cars  ara  gas  guzzlars 

f RANSITIVITY  OF  CONTAINMENT 


FUZZY  SYLLOGISMS 


FUZZY  SYLLOGISMS 


0|A'i  art  B't 
OfCt  art  O't 

70i  E'a  art  F't 


moat  younp  woman  ara  alim 
many  allm  woman  ara  attraetWa 

70  young  woman  art  atiraetiva 


Chaining 


B  >  C.E 


A.  F  *  D 


t  coniunetionu  k  C,  E  >  A.T  ■  B^O 


Aniactdtnl  coniunclion  BbO.  EbA*C.  F^B 


Initratction /product 


C  B  A*B.  E  B  A.  F  B  C*0 


Oiaiunctiva  ayllogiama 


conBBQuent  coniunction 


antecedent  coniunction 


chaining 


PRODUCT  RULE 


U 


OfU'c  ara  A'a 
OxA'a  ara  B's 


(Qt  (£)  OtlU'a  ara  (A  and  B)'a 

^(Oi  Ot)U's  ara  B's 
*~C(Or  ®  Oj)U's  ara  B's  (monotona  0 

or  B  c  A) 


most  studants  ara  undargraduataa 
many  undargraduataa  ara  Iraahman 

(moat  Si  many)  studants  ara  fraahman 


EXAMPLE 


Intaraaction/product  ayllogism 


Oi  A'a  ara  B't 

Oi  (A  and  B)'t  ara  C't 

(OiBOi)  A’a  ara  (B  and  O't 


c 

c 

c 


almott.all  A'a  ara  B't 

alt  B's  ara  C't 

all  (A  and  8)'a  ara  C't 


(almott.allBall)  A'a  ara  (B  and  O's  < 
(almott.all}  A'a  ara  (B  and  O't 
2  almoBt-all  A'a  ara  C't 
almott.all  A'a  ara  C't  (it  monotonIc) 


a 


DISCUSSION  ON  MBSENTATIM  OF  LOTPI  ZADIH 


OR.  DtGROOT:  Art,  do  you  havo  any  oo—aata  you  uould  lika  to 

■aka? 

DR.  OEHFSTBRi  Aa  tha  flrat  diaeuaaant,  1  auppoao  it'a  ay  taak  to 
introduea  tha  faroeioua  battlaa  that  Morria  waa  alluding  to  but  I  hopa  I 
Mill  diaappoint  you  on  that. 

1  agroa  that  in  tha  raal  tiorld  thara  ara  aany  logiea  uaad  and 
■oat  of  thaa  ara  pratty  fuxzy.  In  all  diffarant  diaciplinas  wa  raaaon 
in  aany  diffarant  waya. 

I  oartainly  found  avarything  that  Lotfi  aaid  graat  and  vary 
appaallng  as  hauriatiea  and  Z  think  it'a  a  fascinating  fiald  of  study  to 
think  in  taras  of  thasa  hauristios,  whieh  is  pratty  auoh  tha  way  I  think 
of  uhat  I'va  boon  liataning  to. 

1  night  Just  aantion  this.  In  tha  sida,  sinea  tha  uords 
"raprasantation”  and  "usually”  and  things  eaaa  up.  a  oollaagua  of  aina. 
Prod  Hostallar,  wrota  a  long  aaquanea  of  papara  with  Bill  Kruskal  in  tha 
Raviaw  of  tha  International  Statistical  Institute  on  tha  ooncapt  of 
raprasantation  and  what  it  naans,  and  ao  on. 

Prad  has  had  a  long  interest  in  writing  on  sort  of  tha  heuristic 
sida.  1  think  Augustine  (Kong)  is  involved  in  such  a  project  at  tha 
praaant  tiaa,  and  ha  night  be  able  to  lead  to  soaa  kinds  of  discussions 
you  would  be  intarastad  in. 

My  own  need,  however,  is  aora  in  tha  direction  of  a  need  to 
dafuzzify  fuzzy  logic.  By  that,  I  naan  not  taking  tha  word  "fuzzy”  out 
of  it  since  that's  that  tha  assanea  of  it,  but  sonahow  clarifying  tha 
concepts  of  fuzziness. 

Soaathlng  that  sticks  in  ay  aind  is  that  R.  A.  Piahar,  who  is 
aoaatlaas  thought  to  be  an  obscure  parson,  saya  soaaplaoa  —  and  I'a 
Just  paraphrasing  it  —  tha  wonderful  thing  about  probability  is  that 
you  can  reason  vary  praclsaly.  Tha  parados  is  dealing  with  uncertainty 
in  vary  precise  taras  and  I  think  that's  sort  of  what  ay  focus  is  aainly 
on. 


So  1  foal,  as  I'n  sura  Glenn  does,  a  need  to  understand  tha 
■athanatics.  I  think  Glenn  has  nada  an  advance  in  that  in  a  recant 
technical  report  ha  goes  over  aoaa  of  tha  concepts  of  iMrginalization 
and  extension  and  ooabination  and  tries  to  relate  tha  possibility 
■aasuras  to  belief  functions  in  various  ways.  I'n  sure  ha  will  talk 
about  that. 


I  tuppoM  this  ean  hs  dsbstsd.  ^  bslnf  prsciss*  of  eourss.  you 
cot  down  to  narrow  aodsls  that  ara  not  going  to  stand  up  in  tha  long 
run,  but  if  you'ra  not  praeiaa  you  sort  of  wind  up  going  around  and 
around  tha  nulbarry  bush  and  not  gatting  anywhara.  So  this  is  tha  kind 
of  problan  that  troublas  na. 

Tha  othar  thing  that  1  think  wa  naad  to  do  is  to  taka  aona  of  tha 
axaaplas  that  Lotfi  has  brought  up  in  raoant  writing  and  also  in  tha 
draft  papar  (that  ha  didn't  follow  praoiaaly)  and  try  to  analysa  than, 
not  froa  ay  languaga  or  fron  his  languaga,  but  froa  aoaa  languaga  that 
wa  both  undarstand  as  a  ooaaon  seiantifie  approach  to  problans  and  aaa 
whara  that  kind  of  things  triaa  to  load. 

Again  I  baliava  Clann  has  dlseuaaad  savaral  of  Lotfi 'a  axaaplas 
in  his  raoant  taohnieal  raport  and  wa  naad  to  do  nora  of  that  in  ordar 
to  try  to  bridga  tha  gap. 

1  anjoyad  tha  prasantation  vary  nuch  and  aona  of  tha  intuitiva 
prineiplas,  lika  fuzxinass  gatting  woraa  tha  aora  things  you  try  to 
conbina,  things  of  that  sort  which  certainly  hava  to  ba  trua  but 
taehnically  I  think  wa  naad  to  gat  down  to  spacifios  of  what  tha 
nathaaatics  is  and  how  to  analyxa  axaaplas  in  a  ooaaon  way. 

Dl.  MATSON:  I  always  anjoy  hearing  Lotfi  apeak  because  it  nakas 
na  aware  of  how  pocr  ay  understanding  of  the  subject  is  whan  ha  nanagas 
to  gat  so  nany  different  ideas  into  but  a  short  fraaa  of  tine. 

I  think  at  this  tine  I'll  Just  run  through  tha  Vu>graph  I 
prepared  and  it  addrasaaa  rather  than  tha  issues  raised  in  his  talk, 
thia  new  concept  of  usuality  which  struck  na  as  being  a  vary  interesting 
one  which  I  naad  to  go  away  and  think  about. 

gather  than  do  that.  I'll  go  back  to  soaa  of  tha  basics  which 
soaa  of  you  nayba  wanted  to  ask  about  fussy  sat  theory. 

Ha'va  bean  clear  that  its  a  theory  for  theories  iapracision 
rather  than  uncertainty  and  as  Lotfi  said,  ha  seas  it  as  being  a 
coapanion  to  probability  theory  rather  than  a  raplacanant  for 
probability  theory.  I  think  that's  an  inportant  point  to  aaka,  so  if 
wa'ra  using  it  in  artificial  intalliganca  systaas  wa  ara  using  it  in 
parallel  with  probability  theory. 

Of  course ,  again  one  could  back  off  fron  that  and  people  hava 
often  asked  this  question  and,  fine,  I  ean  sea  that  iapracision  oould  ba 
thought  of  as  being  a  different  concept  froa  uncertainty,  but  if  you'ra 
precise  I'n  uncertain  and  so  it  is  always  possible  for  iapracision  that 
I  find  in  tha  real  world,  for  aa  to  describe  it  personally  in  tarns  of 
uncertainty. 

1  think  Lotfi  nantionad  this.  Ha  said  ha  took  it  as  a  decision, 
an  analytic  decision,  that  ha  was  not  going  to  follow  that  route 
although  ha  raoognixas  it's  a  route  that  oould  ba  followed. 


1  think  it  is  inportsnt  to  rooogniss  that  that  is  s  parting  of 
ths  ways  batwssn  ths  fussy  and  tha  Bayasian  approaehaa  to  thaaa  things 


Now  thara  ara  and  always  hava  baan  a  whola  lot  of  problaas  which 
paopla  now  to  fussy  sat  thaory  ralsa  whan  looking  at  it.  Lotfi  has 
■antionad  thasa  alraady. 

Wiara  do  tha  nuabars  eoaa  froa?  Hall,  in  this  oontaxt  you  ean 
gat  back  to  tha  arguaant  Glann  has  baan  asking  a  lot  that  what  you  naad 
in  talking  about  wharo  tha  nuabars  ooaa  froa  is  to  coapara  soaathlng 
with  eanonloal  asaaplas.  In  tha  Bayasian  thaory  you'va  got  tha 
canonical  axaapla  of  balls  and  urn.  In  Shafar's  thaory  you'va  got  tha 
canonical  axaaplaa  of  tha  aaaning  of  avidanoa  and  so  on. 

Tha  fact  that  thara  aran't  such  in  fussy  sat  thaory  at  prasant* 
(although  I  nay  ba  wrong  thara)  is  a  causa  of  ooncarn. 

Thara  is  also  tha  problao  how  you  do  actually  raprasont  fussy 
cats  which  stand  for  scaathing.  Lotfi  put  up  aoaa  alidos  saying  fussy 
sots  naant  usually.  Hall.  I  could  praaunably  ooaa  up  with  anothar  fussy 
sat  which  looked  siallar  which  was  also  supposed  to  raprasont  usually. 

I  don't  hava  tiaa  to  go  into  this.  I  suspect  that  tha  eoaabaok 
froa  Lotfi  would  ba  it  shouldn't  aattar  Just  pracisaly  what  thasa  fussy 
sets  should  look  Ilka,  causa  after  all  it's  tha  thaory  of  fussiness. 

Hall,  I  don't  know  of  any  detailed  studies  which  hava  looked  at 
tha  iaplicationa  of  tha  output  of  tha  fussy  analysis  as  a  result  of 
changing  tha  input  nuabars  or  tha  shapes  of  tha  input  nunbars. 

If  tha  outputs  ara  sensitive  to  thasa  things,  than  it's  crucial 
to  know  pracisaly  where  to  gat  than  froa.  If  tha  outputs  ara 
insensitive  I  wonder  whether  tha  outputs  say  anything  at  all,  but  that's 
soaathlng  wa  can  ooaa  back  to. 

Anothar  thing  that  is  vary  often  asked  is  why  thasa  particular 
oonnactivas,  why  tha  aax  and  tha  nin.  It's  now  wall  known  that  thara  is 
a  whola  host  of  oonnactivas  irtiich  could  describe  tha  connactiva,  tha 
and,  and  tha  or,  which  hava  tha  properties  that  you  require ,  naaaly  in 
tha  case  of  crispness  they  correspond  to  traditional  logic  but  in  the 
case  of  fuxxinass  they  don't. 

And  thara  ara,  I  know  tha  axistanea  of,  though  I  haven't  studied 
soaa  studies  which  go  into  why  one  should  use  nax  and  sin.  Maybe  Lotfi 
could  COM  back  to  that  at  som  stage. 

I  don't  think  that  is  satisfactory.  I've  Mntionad  tha 
interpretation  thara,  but  I  think  tha  last  thing  is  soMthing  that  is 
iaportant  and  1  think  it  doss  relate  to  artificial  intalliganca 
blandnass.  Lotfi  has  suggested  that  in  fact  this  is  a  virtue  of  using 
SOM  kind  of  fussy  analysis. 


-  101  - 


•1 


TIm  fact  is  that  aftar  awhila  raally  tha  lapraoiaion  la  so  graat 
that  you  oaa't  raally  aay  anything,  and  I  wondar  uhathar  that  aort  of 
blandnass  is  soaathlng  wa  aetualiy  do  want  In  our  artificial 
Intalliganea  aipart  syataas. 

I  suspaot  that  tha  graatar  dagraa  of  raprasantation  of 
uncartainty  which  a  layaaian  theory  affords  aight  ha  a  virtua  rathar 
than  a  vioa. 

Thara  ara  positiva  points  I  saa  in  fussy  sat  theory.  Aa  I'va 
■antionad  before.  I  don't  hold  with  tha  view  that  tha  only  way  to  handle 
uncertainty  oust  be  to  use  probability  theory.  As  I  said  in  ay  praVious 
little  speech,  you  can  decide  to  not  go  along  with  tha  axioaa  that 
support  subjaetiva  probability  and  refuse  to  place  tha  bats,  refuse  to 
go  in  for  Dutch  books,  and  I  don't  saa  any  raaaon  why  you  shouldn't  do 
that. 

Tha  poaitiva  points  I  saa.  therefore,  as  Art  Just  said,  you  can 
think  of  it  aa  a  heuristic  to  support  tha  way  you  think,  but  as  a 
raprasantation  for  natural  language,  it  saaas  to  aa  it  has  a  lot  going 
for  it  and  that's  what  artificial  intalliganea  aystaas  ara  trying  to  do. 
As  a  tool  for  lapraeisa  iaplleation  it  aaaas  to  aa  it's  a  vary  sansibla 
thing,  but  again  I  think  of  it  as  a  hauristically  reasonable  thing  to  do 
rathar  than  soaathlng  which  wa  oust  do  out  of  aoaa  logic  or  naeassity. 

Perhaps  tha  coabinatlon  which  Lotfl  talks  about,  tha 
raprasantation  of  iapraeision  about  probabilities,  is  tha  one  which 
attracts  aa  aoat.  Wall.  I  think  I'va  said  enough. 

DR.  DEGROOT:  Thanks  vary  auch. 

DR.  ZADEH:  First  let  aa  coaaant  on  a  point  that  Professor 
Daapstar  aada  in  his  coaaants  and  that  has  to  do  with  tha  desirability 
of  having,  let's  say,  soaa  sort  of  rigorous  nathaaatieal  foundation  for 
tha  theory,  or  soaathing  that  goes  beyond  talking  about  these  things  in 
a  aora  or  lass  qualitative  fashion. 

I  do  foal  that  thara  is  such  a  need,  but  I  also  feel  that  thara 
are  Halts  to  which  one  can  aspire  to  attain  that  objective. 

In  other  words,  I  think  that  as  people  bacoaa  older,  they  begin 
to  bacoaa  aora  conscious  of  tha  liaitations  of  what  wa  can  do  physically 
or  Intellectually. 

Tha  saae  applies  I  think  to  science.  Initially  wa  have  these 
grandiose  ideas  that  wa  aight  be  able  to  construct  vary  siapla  aodals  of 
tha  universe  1  think  Einstein  was  driven  by  this  sort  of  thing  — 

Just  one  aquation  will  explain  everything,  that  wa  could  have  theories 
of  probability  that  would  tall  you  exactly  what  probability  is  and  so 
forth. 


Gradually  1  think,  or  aoonar  or  latar,  it  bagina  to  dawn  upon 
poopla  that  thaaa  uy  ba  unraaliaabla  objaetivas  that  ao«a  of  tha 
quaatlona,  aoaa  of  tha  iasuaa  and  probabllltias  hara  that  aniaatad 
Barnoulll  and  LaPlaea  and  paopla  lika  that  ara  still  with  us.  Thay  hava 
not  raally  baan  answarad. 

It's  quits  possibla  that  wa  hava  to  lowar  our  sights  and  ba 
satisf iad  with  thaorlas  which  do  not  answar  all  of  thasa  quastions 
eoaplataly,  and  that  is  whara  tha  eoneapt  of  a  disposition  again  ooaas 
in  tha  pietura. 

Dispositions,  as  I  said,  ara  assartions  which  ara  prapondarantly 
but  not  univaraally  trua.  Now  ■athaaatics  is  vary  allargic  to 
dispositions.  You  cannot  writs  a  papar  that  would  ba  accaptad  for 
publication  in  a  raspactabla  MthaMtical  Journal  in  which  tha 
conclusion  would  ba  in  tha  fom  that  usually  soaathlng  works  or  usually 
it's  true.  You  cannot  do  that  sort  of  thing.  It  has  to  ba  trua,  or  it 
is  not  trua.  That's  all  there  is  to  it. 

Now  if  wa  adhere  to  that  kind  of  a  standard,  than  wa  ara 
asaantially  shutting  off  ourselves  froa  all  sorts  of  huaan  knowledge  and 
wa  also  Mka  it  possible  to  deal  with  expert  systsas  because  it's 
lapossibla,  I  think,  to  coaa  up  with  a  theory  that  eoaplataly  and 
satisfactorily  answers  all  of  tha  conditions  that  arise. 

In  lowering  tha  sights,  however,  you  retreat  as  little  as 
possibla.  I 'a  not  suggesting  that  wa  retreat  all  tha  way  to  philosophy 
or  soaathlng  whara  you  Just  wave  your  hands,  but  rather  you  than  do  what 
is  dona  in  fuzzy  sat  theory,  and  that  is  you  allow  truths  which  ara 
partial  truths,  you  allow  probabilities  that  ara  specified 
linguistically  *  likely,  unlikely,  vary  likely  and  so  forth,  you  even 
allow  thasa  ■aabarshlp  functions  whara  tha  question  of  how  do  you  find 
that  value,  .8  —  how  do  you  do  it  rigorously,  cannot  ba  answarad 
perhaps. 

You  live  with  this  sort  of  thing.  You  say,  wall,  there  ara 
Halts  to  tha  process  of  praclslation  and  so  long  as  I  can  coaa  up  with 
conclusions  that  in  soaa  sense  aaka  it  easier  for  aa  to  arrive  at  the 
decision,  or  diagnose  tha  disease  or  understand  natural  language,  or 
will  do  a  nuabar  of  other  things.  I'll  ba  satisfied  even  though  it  aay 
stop  short  of  a  ooaplata  explanation  in  tha  traditional  spirit. 

I  think  this  is  a  point  that  has  to  ba  aaphaslzed,  that  as  I  said 
in  a  nuabar  of  places,  whan  I  wrote  in  fuzzy  sets,  it  represents  a 
retreat,  and  it  represents  also  an  attaapt  at  finding  an  accoaaodation 
with  tha  pervasive  iapracision  of  tha  real  world. 

In  other  words,  you're  saying  that  tha  sights  wa  sat  for 
ourselves,  tha  goals  wa  sat  for  ourselves,  ara  unattainable.  You  hava 
to  retreat  a  little  bit. 


-  103  - 


Inoidmtalljr.  l«t  Just  ssy  on«  thing  in  rssponss  to  s  question 
about  indapandanea.  I  think  Clann  snswarad  tha  question  in  such  a  way 
that  again  aay  not  be  satisfactory  in  som  sense  to  the  people  who 
expect  a  rigorous  thing  that  is  that  wa  can  define  probabilistically. 
That's  what  Professor  Deapster  did  in  his  paper  and  what  Glenn  used  in 
his  paper.  In  other  words,  you  define  independence,  but  when  it  cooes 
to  a  practical  situation  if  soaebody  asked  you  a  question  "are  these 
bodies  independent  or  not?"  at  that  point  we  really  don't  have  criteria 
which  allow  us  to  answer  the  question.  I  think  this  is  what  Glenn  had 
in  aind  when  he  said  you  have  to  use  your  intuition,  you  have  to  use 
heuristics,  and  so  forth.  This  is  what  it  boils  down  to. 

So  the  oonneetlon  between  theory  and  reality,  that  connection 
becoaes  a  fuzzy  one. 

To  go  back  to  coaaents  aade  by  Stephen.  As  usual  I  think  Steve 
was  very  succinct  in  his  points. 

I  think  that  a  useful  application  of  fuzzy  sets  is  in  the 
characterization  of  probabilities.  In  other  words,  you  take  probability 
theory  as  it  is,  you  don't  aodify  it  in  any  way,  but  you  allow  the 
probabilities  to  be  fuzzy,  which  is  a  generalization  of  interval  valued 
probabilities. 

Now  to  say  the  probabilities  are  fuzzy  is  not  the  saae  as  saying 
that  you're  dealing  with  second  order  probabilities.  Hany  probabilists 
don't  like  second  order  probabilities. 

I  think  when  you  say  that  soaething  is  likely,  when  you  say 
things  of  this  kind,  you  are  really  using  a  posslblllstlc 
characterization  of  what  is  basically  a  probability.  In  aany  situations 
our  knowledge  of  probabilities  is  not  really  good  enough  to  enable  us  to 
Justify  the  use  of  nuabers.  We  slaply  don't  know  that  auch  about  real 
world  probabilities,  and  in  fact  if  we  put  aside  urns  and  cards  and 
things  of  this  kind  that  I  used  as  canonical  exaaples  in  texts  on 
probabilistic  theory,  I  think  that  aost  real  world  probabilities  are  not 
aeasurable. 

The  exaaple  that  I  use  to  Illustrate  that  point  is  the  following 
one.  I  used  that  exaaple  because  1  saw  it  in  a  textbook  on  operational 
analysis  and  they  cited  that  as  an  example  of  an  application  of 
operations  research  type  of  approaches. 

They  said,  well,  suppose  that  you  want  to  decide  whether  or  not 
to  insure  your  car,  and  so  what  do  you  do?  Well,  you  have  to  take  into 
consideration  what's  the  probability  that  it  sight  be  stolen  and  various 
other  things,  and  they  assume  they  have  numbers  for  various  things,  and 
they  assume  that  you  know  there  is  a  probability  that  the  ear  might  be 
stolen  of  .001  or  some  such  number. 


The  question  is  where  do  you  get  thst  nuaber  froa?  My  contention 
is  thst  it  Is  not  possible  to  aessure  or  find  thst  nuaber,  thst  if 
soaebody  ssks  the  question  "whst's  the  probsblllty  thst  your  osr  alght 
be  stolen?”  you  csnnot  snswer  thst  question  on  sny  level,  theoretlcslly 
or  eaplrlcslly  or  sny  level  whstsoever.  The  resson  you  csnnot  snswer  Is 
thst  It  Is  s  unique  sort  of  thing. 

The  Inforastion  thst  Is  svsllsble  sbout  the  theft  of  esrs  in  the 
District  of  Coluabis  is  not  thst  relevsnt  to  the  question  of  whst's  the 
probsblllty  thst  your  csr  alght  be  stolen,  because  it's  a  particular 
car. 


So  this  is  the  old  problea  of  unique  things.  In  other  words,  you 
have  auch  aore  inforastion  thst  the  probability  thst  you  need  is 
conditioned  on  all  sorts  of  things,  whereas  the  problea  that  you  have  is 
not  conditioned  to  those  things,  and  there  is  no  connection  between  the 
two  thst  can  do  you  sny  good. 

The  problea  is  this  —  and  this  Is  by  no  aeans  an  artificial 
exaaple  —  I  think  If  you  look  at  real  world  probabilities,  you  will 
find  that  aost  of  thea  are  like  that.  Most  of  then  are  not  aeasurable, 
so  that  our  perception  that  probabilities  are  well  defined,  they  can  be 
neasured,  is  not  realistic,  not  realistic  at  all. 

The  subjective  point  of  view,  where  you  relate  probabilities  to 
betting  behavior  in  ay  Judgnent  is  also  not  satisfactory  because  it 
aerely  tries  to  explain  one  thing  in  teras  of  another  thing  which  is 
Just  about  equally  ill-defined. 

In  closing  then,  let  ae  say  this,  that  there  are  probleas  having 
to  do  with  the  aeasureaent  of  grades  of  aeabership  which  I  however  don't 
regard  as  a  serious  problea  as  soae  people  do  in  the  use  of  connectives, 
but  all  it  boils  down  to  is  this;  in  effect  it  says  that  the  real  world 
is  too  coaplicated  for  sinple  theories.  You  cannot  do  that.  You  have 
to  have  a  language  which  is  sufficiently  expressive  to  enable  you  to 
deal  with  lapreclsely  defined  probabilities  in  situations  in  which  "and” 
in  one  context  has  one  aeaning  and  in  another  context  has  another 
aeaning;  conjunction  does  not  have  really  a  fixed  aeaning,  situations 
where  iaplication  does  not  have  a  fixed  aeaning,  situations  in  which 
various  predicates  do  not  have  fixed  aeanings,  and  so  forth. 

You  have  to  accept  that.  You  have  to  accept  that  and  you  have  to 
lower  your  sights  and  be  satisfied  with  conclusions  which  are  not  quite 
as  precise  as  those  that  we  expect  of  traditional  theories.  Thank  you. 

DR.  DeGROOT:  Are  there  coeaents  froa  the  floor,  or  questions? 

DR.  BROVNSTON:  Lee  Brownston  froa  Carnegie-Mellon  University. 

I  have  a  question  which  goes  beyond  fussy  set  theory  and 
possibility  theory  and  touches  on  the  theory  of  belief  functions  as  wall 
as  the  -nature  of  expert  systeas. 


105  - 


Wut  ar*  your  Idoas  on  whothor  thoM  tboorios  or#  noraotlvo 
vorsus  Ooserlptivo?  How  do  you  validoto  than  If  you  think  that  thay  ara 
daaorlptiva,  and  if  thay  ara  noraativa  how  do  you  Justify  using  ona 
partioular  sat  of  oparators  as  opposad  to  anothar? 

DR.  ZADEH:  Hall,  hara'a  tha  situation.  Thara  ara  aany  rulas  of 
infaranca.  For  axaapla.  tha  ooaposition  rula  for  infaranea,  and  various 
othar  rulas  of  infaranea. 

Thasa  rulas  of  infaranca  can  ba  tastad  in  axaaplas  which  ara 
sufficiantly  siapla  to  anabla  us  to  usa  our  intuition.  To  tha  axtant  — 
I  usually  tast  thasa  things  to  satisfy  aysalf  that  I'a  not  off  on'tha 
wrong  track. 

Generally,  and  1  haven't  found  exceptions  to  this  thing,  in  tha 
case  of  siapla  axaaplas,  canonical  axaaplas,  thay  tend  to  agree  with  our 
intuitition,  which  than  sort  of  encourages  you  to  apply  thaa  to  axaaplas 
which  ara  not  so  siapla,  to  asks  it  possible  for  us  to  usa  our 
intuition. 

Mara's  tha  situation.  In  aany  oases  thasa  axaaplas  ara  such  that 
you  cannot  usa  ixrobability  theory  and  things  of  this  kind  to  ooaa  up 
with  answers  to  those  problaas. 

In  othar  words,  if  1  said  soaathing  like  usually  X  is  P  and 
usually  soaathing-soaathing,  and  than  I  ask  soaabody-probability  theory, 
okay,  what  can  you  tall  aa?  A  probabilist  could  undo  it. 

Hare  I  disagree  a  little  bit  with  Professor  Lindlay  because 
Professor  Llndley  felt  that  probability  theory,  Bayesian  theory  can 
handle  all  of  these  probleas. 

My  test  then  is  to  give  a  nuaber  of  probleas  like  that  and  say, 
okay,  go  ahead,  solve,  so  we  do  not  have  here  the  coafort  of  being  able 
to  use  soae  other  theory  and  to  be  able  to  eoapara  the  results. 

Now  in  general  I  tend  to  be  soaeidiat  wary  of  noraative  theory 
because  1  think  that  noraative  theories  when  they  tend  to  disagree  with 
huaan  intuition,  upon  further  inspection  turn  out  to  contain  soae 
assuaptions  that  aay  not  be  warranted  or  soae  other  things,  so  1  tend  to 
feel  that  if  there  is  dlsagreeaent  that  the  chances  are  it's  the  theory 
that's  wrong  rather  than  huoum  intuition. 

Of  course  there  are  cases  where  that  is  not  so.  Whatever  I  say 
is  the  disposition.  In  other  words,  that  is  usually  the  case  but  not 
always  the  case. 

I  take  also  soae  issue  with  Professor  Tversky's  exaaples.  There 
is  one  exsaple  which  suggests  that  people  aay  assign  higher  probability 
to  A  than  to  B  if  A  is  a  subset  of  B.  He  feels  that  this  is 
counter-intuitive  but  to  ae  it  is  not  all  counter-intuitive.  In  fact,  I 
have  transparency  to  show  that.  It's  siaply  a  aatter  of  how  you 
interpret  these  things. 


-  106  - 


People  interpret  the  question  in  such  a  way  that  they  answer  in 
terms  of  probability  of  A  given  B  even  though  the  person  that  asks  the 
question  expects  that  the  answer  to  be  a  probability  of  B  given  A,  so 
the  probability  of  B  given  A  tends  to  be  counter-intuitive,  but  if  you 
interpret  the  answer  as  that  to  the  question  what's  the  probability  of  A 
given  B,  then  it's  perfectly  intuitive. 

So  I  tend  to  shy  away  from  any  pretense  to  normativeness. 

Another  thing  which  I  have  some  reservations  about  is  the 
principle  of  maximization  of  expected  utility  which  is  preconservative 
as  the  normative  principle. 

DR.  DeCROOT:  Nozer? 

DR.  SINGPURWALLA;  First  a  general  comment.  I've  heard  the  word 
"Bayesian"  used  here  several  times.  I  believe  everybody  who  has  used  it 
has  in  mind  the  ordinary  calculus  of  probability,  just  the  way  we've 
learned  it.  You're  not  referring  to  Bayesian  inference  per  se,  you're 
just  referring  to  a  use  of  the  ordinary  calculus  of  probability. 

Now  to  the  question.  Professor  Zadeh,  I  sensed  a  kind  of 
inconsistency  in  one  of  your  statements.  You  used  the  words 
"possibility  of  a  probability"  somewhere,  and  then  later  on  you  said 
that  the  notion  of  probability  was  not  clear,  or  at  least  was  not 
complete.  You  also  said  that  the  notion  of  subjective  probability  was 
imprecise.  If  that  be  the  case,  what  did  you  have  in  mind  when  you  said 
"possibility  of  a  probability?" 

DR.  ZADEH;  First  let  me  respond  to  the  first  part  of  your 
comment. 

I  think  that  the  term  Bayesian  is  used  in  two  different  senses. 

The  sense  in  which  it’s  used  by  people  who  don't  know  too  much  about  the 

probability  theory,  people  in  AI  and  so  forth,  when  they  say  Bayesian 
they  mean  the  application  of  the  rules  of  probability  theory. 

DR.  SINGPURWALLA:  That's  really  what  I  was  trying  to  emphasize. 

DR.  ZADEH:  I  think  this  is  not  the  sense  in  which  Professor 
Lindley  would  use  the  term  Bayesian.  There  it  has  to  do  with  ratio  of 
subjective  probability,  what  you  do  if  you  don't  know  probabilities,  and 
so  forth,  the  frequentist  interpretation  versus  the  Bayesian  point  of 
view,  and  so  forth. 

That  gets  into  different  issues.  Nobody  will  question  the  use  of 
the  formula  "probability  of  Y  is  the  integral  probability  of  Y  given  X." 

This  is  not  the  sort  of  thing  we  are  talking  of. 

Then  if  I  use  the  term  Bayesian  then  it  depends  really  who  I'm 
talking  to.  If  I'm  talking  to  AI  people  I'm  using  it  in  this  first 
sense.  If  I'm  talking  to  people  who  are  really  probabilists  then  I 'am 
using  it  in  the  second  sense.  That's  the  differentiation  that  one  has 
to  make. 


Now  with  respect  to  the  second  point,  could  you  Just  run  over 
again  what  — 

DR.  SINGPURWALLA :  You  used  the  term  "possibility  of  a 
probability."  You  also  said  the  term  probability  was  very  unclear;  so 
what  exactly  did  you  have  in  mind? 

DR.  ZADEH:  Okay,  here's  the  situation.  Probability  theory  by 
itself  is  a  very  precise  theory.  The  imprecision  comes  when  you  want  to 
relate  that  theory  to  the  real  world.  It's  in  the  interpretation  of 
symbols  and  various  things  that  the  difficulty  arises. 

This  issue  is  usually  avoided  in  texts  on  probability  theory.  In 
other  words,  if  you  read  a  typical  book  on  probability  theory  there  will 
be  practically  no  discussion  of  subjective  probability  or  things  of  this 
kind.  This  is  an  issue  that's  avoided. 

Now  in  any  theory,  in  any  theory,  you  have  that  problem.  It's 
the  problem  of  correspondence.  It's  the  semantics  of  the  theory.  This 
is  really  what  it  boils  down  to,  and  questions  that  Steve  raised  related 
to  the  semantics  of  this  theory  —  what  do  you  mean  by  .8,  what  do  you 
mean  by  this,  what  do  you  mean  by  that. 

Now  in  the  theory  --  in  fuzzy  logic  since  probability  and 
possibility  are  under  the  same  roof.  It's  perfectly  okay  to  raise  the 
question  "What  is  the  possibility  of  probability,"  "What's  the 
probability  of  possibility,"  and  so  forth. 

So  if  I  said  that  all  I  know  is  that  a  certain  probability 
distribution  lies  in  a  certain  set  --  in  other  words,  you  have 
incomplete  information  —  for  example,  it's  the  set  of  normal 
distributions  with  certain  variance,  where  the  mean  is  between  alpha  and 
beta. 

Thats  a  class  of  probability  distributions.  Now  that  class  is 
the  possibility  distribution  for  probability  distribution.  You  say 
"what  are  the  possible  probability  distributions?" 

Now  as  I  said,  in  the  case  of  possibility  theory,  possibility  is 
a  matter  of  degree,  so  if  I  said  —  instead  of  saying  it's  normally 
distributed  with  the  mean  between  alpha  and  beta,  if  I  said  that  the 
mean  value  is  close  to  five,  that  parameter  is  a  fuzzy  parameter  and  as 
a  result  of  that,  that  possibility  distribution  will  become  —  it's  a 
fuzzy  sort  of  a  thing  so  I  am  dealing  with  a  possibility  distribution  of 
probability  distributions  with  that  possibility  distribution  being  a 
matter  of  degree.  This  is  what  it  means. 

I  think,  and  this  is  what  I  did  in  my  1979  paper,  on  fuzzy  sets 
information  where  the  Dempster-Shafer  theory  was  generalized  to 
situations  in  which  the  sets  that  you  have  are  fuzzy  sets  and  the  basic 
probability  numbers  are  fuzzy  probabilities.  That's  the  generalization, 
that  was  given  in  that  paper. 


DR.  SOLANDt  !'■  Rlehard  Soland  tnm  G«orf«  Hashlngton 
Unlvdrsity,  aiui  vdry  naively  1  vould  like  to  com  book  to  the  queetion 
of  seeanties  beoauae  it  aeeaa  to  ae  that  one  of  the  benefita  of  the 
fuzzy  aet  approach  auppoaedly  ia  keeping  thinga  in  natural  language*  but 
I  think  that'a  perhapa  a  ganger  alao  in  that  people  don't  always 
understand  the  saae  things  when  they  use  natural  languages.  Soaetiaes 
it's  eultural  and  soaetiaes  it's  Individual. 

I  wondered  to  what  extent  this  can  have  an  effect  on  the 
operational  nature  of  the  theory. 

Too  often  people,  when  they  deal  with  soaewhat  quantitative 
probleaa  in  a  senantic  way*  tend  to  be  carelasa  in  being  iaprecise  >> 
that  ia,  not  thinking  clearly  and  carefully,  and  will  perhapa  aay  that 
usually  it  takes  such-and-such  an  aaount  of  tine  without  thinking  about 
that  clearly  enough  to  be  precise,  even  in  the  sense  of  possibility  and 
fuzzy  sot  theory.  1  think  in  a  lot  of  our  analysis  work  we  attenpt  to 
be  quite  quantitative  in  our  aodeling  in  order  to  put  precision  in, 
where  lack  of  precision  nay  cause  errors  in  the  analysis. 

I  wonder  what  kind  of  dangers  night  cone  into  the  analysis 
because  of  individual  and  eultural  differences  perhaps  in  inplenenting 
this  theory. 

DR.  ZADEH:  That's  a  really  good  point.  I  think  that  there  is  a 
great  deal  of  alsunderstanding  there  when  it  cones  to  the  issue  of 
aeaning,  understanding  natural  languages. 

«hat  is  not  sufficiently  differentiated  is  the  problea  of 
understanding  on  the  one  hand  and  the  problea  of  representation  on  the 
other  hand. 

The  point  of  view  that  I  take  here  is  that  the  approach  relates 
to  representation  of  aeaning,  rather  than  to  understanding  of  aeaning. 
It's  a  language  that  allows  you  to  represent  a  aeaning  so  if  you  say 
soaething  to  ae  and  I  ask  the  question  what  do  you  aean  by  that  —  I 
will  not  try  to  figure  out  what  you  aean.  1  will  ask  you  the  question 
what  do  you  aean  by  that,  and  then  this  is  the  language  that  enables  you 
to  represent  a  aeaning. 

Now  one  of  the  transparencies  showed  and  it’s  sort  of  related  to 
the  question  here,  what  do  you  aean  by  usually  exists  and  leads  to  the 
question  that  Prof.  Oeapster  raised,  so  what  I  tried  to  do  then  is 
soaething  like  the  following. 

1  ask  the  question  what  do  you  aean  by  F7  For  exaaple,  usually  X 

is  saall.  1  say  what  do  you  aean  by  saall.  So  you  say  saall  is  this. 

1  ask  you  what  do  you  aean  by  saall.  I  don't  try  to  try  to 

guess.  Then  I  ask  what  do  you  aean  by  usually.  I  say  usually  is  this. 


Now  notioo  1  allowod  tho  usually  to  go  to  fussy  so  if  your 
psreoption  of  usually  Is  so  poor  that  you  osnnot  rosily  tfrsu  a  ourvo 
liko  that,  you  osn  drsu  soaothlng  vary  fussy. 

Mow  ones  you  hsvs  ssplainsd  to  as  what  is  nsant  by  usually  and 
what  Is  Bsant  by  snail,  than  1  will  taka  thasa  two  and  1  will  go  through 
tha  proeadura  which  anablos  as  to  find  what  is  tha  aaaning  of  usually  X 
is  saall. 

So  sananties  basically  is  nothing  nora  than  tha  oonposition  of 
tha  aaaning  of  a  oonplax  antity  froa  tha  noanings  of  its  oonstituants. 
I'll  supply  tha  aaaning  of  tha  whola  thing.  That's  raally  what  it  boils 
down  to. 


Ones  tha  aaaning  of  usually  X  is  P  is  aada  aora  pracisa,  this  is 
tha  praeiaiation  aaaning.  than  I  can  raduea  this  thing  to  a  problan  with 
nonlinaar  progranaing  or  sonathing  lika  that. 

Until  than.  1  cannot  do  it  baeausa  I  raally  don't  know  what's 
naant  by  usually  X  is  F  and  that  is  whara  elassioal  probability  thaory 
will  faltar  baeausa  elassifical  probability  thaory  doas  not  provida  a 
languaga  for  tha  raprasantation  of  tha  aaaning  of  things  lika  usually  X 
is  F.  That's  raally  what  it  doas  not  do. 

Lot's  tska  a  siaplo  problaa.  Supposa  I  say  an  urn  oontains  a 
hundrad  balls,  of  which  ko  ara  black  and  tha  rast  ara  whita.  What's  tha 
probability  that  tha  ball  piekad  at  randoa  is  black,  so  okay,  you  divide 
ona  by  tha  other  and  so  forth. 

But  supposa  I  fuzzify  tha  problaa.  Supposa  I  said  that  tha  urn 
oontains  approxiaataly  a  hundrad  balls  of  which  savaral  ara  big. 

Instead  of  saying  black  and  whita  I  introduce  soaathing  that  is  fuzzy, 
lika  size,  or  large. 

What's  tha  probability  that  tha  ball  drawn  at  randoa  is  large? 
You'd  be  in  trouble,  because  there  is  no  aaehanisa  for  rapraaanting  tha 
aaaning  of  savaral,  large,  approxiaataly  ona  hundrad,  and  so  forth. 
That's  whara  tha  problaa  is  going  to  arise. 

DR.  DaCROOT:  Wall,  that  stiaulatad  tha  audience.  Let  aa  sea 
those  hands  again  and  I'll  pick  ona.  Stephan,  I'll  give  you  another 
try. 


DR.  WATSON:  Can  I  Just  coaa  back  briefly  on  that,  Lotfi. 

Supposing  I  give  you  what  I  naan  by  usually  and  What  I  aaan  by 
big,  why  should  1  go  along  with  whatever  calculations  you  do  on  those 
nunbars  since  I  don't  sea  that  there's  a  fraaawork  of  naeassity  which 
aakas  it  clear  that  those  ara  tha  oaloulations  1  should  do  on  thasa 
nunbars. 


iVV'" 


'  m  • 


-  110  - 


DR.  ZADBH:  That  ralatas  again  to  tha  iasua  that  was  raiaad 
bafora  at  laast  tangantially  and  that's  that  within  tha  aystaa  you  hava 
vary  faw  dognaa.  In  othar  words,  thara  ara  dafault  approaohas.  Tha 
dafault  approach  would  ba  lika  tha  ona  that  I'va  indieatad.  In  othar 
words,  that's  tha  standard  approach. 

Howavar,  if  you  want  to  intarprat  thasa  things  diffarantly,  if 
you  want  to  coabina  than  diffarantly,  if  you  want  to  instaad  of  using 
aaxi-nln  you  wantad  to  usa  sona  of  ttM  T  noms  and  so  forth,  it  is 
parfactly  allowabla  within  tha  thaory. 

In  othar  words,  at  any  point  you  can  ovarrida  what  ara  standard 
procaduras  in  tha  thaory  and  substituta  sonathing  that  in  you  Judgnant 
is  a  nora  aecurata  raprasantation  of  tdiat  you  axpact  of  this  sort  of 
thing. 

DR.  MATSON:  And  how  do  I  choosa  batwaan  such  things  if  thay  giva 
diffarant  answars? 

DR.  ZADEH:  Hara's  tha  situation  than.  Hhat  you  hava  to  do  is 
you  hava  to  naka  a  study  of  thasa  things.  You  hava  to  hava  aasantially 
a  collaction  of  thasa  tools  togathar  with  sona  eoansnts,  say  this  works, 
wall,  this  situation  and  this  has  cartatn  propartias  and  this  has 
cartain  propartias  and  so  forth,  but  if  you  hava  sona  Idas  as  to  what 
ara  tha  propartias  of  thasa  things  than  you  pick  tha  ona  that  fits  your 
parcaption  bast. 

In  tha  absanca  of  that  sort  of  a  thing  you  just  usa  tha  standard 
dafault  rula  that  is  within  tha  systan.  An  axanpla  of  that  would  ba  tha 
dafinltion  of  connecting  and  there  would  ba  a  standard  rula  thara.  If 
you  don't  lika  it,  if  you  faal  that,  wall,  this  doesn't  really  accord 
with  what  you  hava  in  nind  than  usa  such  and  such  a  rula. 

That's  why  it  is  opan-andad  in  sona  sense.  In  othar  words,  you 
can  substitute  user-defined  relations  for  whatever  is  stored  in  tha 
systan. 

DR.  DeGROOT:  I  think  we  could  go  on  discussing  this  for  nuch 
greater  length  but  lunch  is  inninent. 

I  do  point  out  that  there  will  be  nore  tine  for  discussion.  Keep 
your  questions  in  nind.  There's  an  hour  set  aside  this  afternoon  fron 
A:00  to  5:00  for  general  discussion. 

1  want  to  thank  all  the  speakers  this  norning  and  the 
discussants,  and  I  want  to  connend  Prof.  Lindley  for  being  so  patient 
and  keeping  quiet.  But  he  knows  that  he  gets  first  crack  this  afternoon 
and  I  think  that  perhaps  has  soaething  to  do  with  it. 

(Laughter.) 

(Luncheon  recess.) 


THE  PROBABILITY  APPROACB  TO  THE  TREATMENT 
OP  UNCERTAINTY  IN  ARTIFICIAL  INTELLICXNCE 
AND  EXPERT  SYSTEMS 


by 

Dennis  V.  Llndley 

Somerset,  England  and  George  Washington  University 


Talk  given  at  a  conference  on  the  calculus  of  uncertainty  In 
artificial  intelligence  and  expert  systems,  George  Washington  University, 
27-28  December  1984.  Supported  by  Grant  DAAG29-84-K0160,  U.S.  Army 
Research  Office,  and  Contract  N00014-77-C-0263,  Project  NR042-372, 

Office  of  Naval  Research,  with  The  George  Washington  University. 


-  112  - 


1.  INTRODUCTION 


Our  concern  In  this  paper  Is  not  with  a  general  discussion  of 
artificial  intelligence  (AI)  and  expert  systems  (ES)  but  with  one 
particular  aspect  of  them,  namely  the  occurrence  of  uncertainty  state¬ 
ments  within  AI  or  ES.  We  discuss  how  they  should  be  made,  what  they 
mean,  and  how  they  combine  together. 

Uncertainty  Is  obviously  present  In  most  ES  algorithms  because 
experts  can  rarely  be  totally  sure  of  the  statements  they  make.  Thus 
In  medical  ES,  the  presence  of  a  symptom  array  does  not  Invariably 
imply  the  presence  of  one  disease,  so  that  diagnosis  Is  Inherently 
uncertain.  Even  the  symptom  may  exhibit  uncertainty  for  doctors  may 
differ  In  their  Interpretations  (see  section  10) .  Prognosis  is  clearly 
even  more  uncertain.  When  discussing  purely  deterministic  procedures 
there  may  be  some  merit  In  Introducing  uncertainty.  For  example,  chess  Is 
a  game  with  perfect  Information  yet  AI  programs  sometimes  Incorporate 
uncertainty  as  a  way  of  avoiding  the  terrible  complexity  of  the  game. 

So  uncertainty,  whilst  perhaps  not  ubiquitous,  frequently  occurs.  Our 
task  Is  to  study  approaches  to  dealing  with  it  within  AI  and  ES. 

2.  THE  INEVITABILITY  OF  PROBABILITY 
Our  thesis  is  simply  stated:  the  only  satisfactory  description 
of  uncertainty  is  probability.  By  this  is  meant  that  every  uncertainity 
statement  must  be  in  the  form  of  a  probability;  that  several  uncertainties 
must  be  combined  using  the  rules  of  probability;  and  that  the  calculus  of 
probabilities  is  adequate  to  handle  all  situations  Involving  uncertainty. 
In  particular,  alternative  descriptions  of  uncertainty  are  unnecessary. 


These  Include  the  procedures  of  classical  statistics;  rules  of  combination 

such  as  Jeffrey’s  (1965) ;  possibility  statements  in  fuzzy  logic,  Zadeh 
(1983);  use  of  upper  and  lower  probabilities.  Smith  (1961),  Fine  (1973); 
and  belief  functions,  Shafer  (1976).  We  speak  of  "the  inevitability 
of  probability." 

3.  MATHEMATICAL  AND  PHYSICAL 
MEANINGS  FOR  PROBABILITY 

Before  defending  the  thesis,  it  had  better  be  made  clear  what 
we  mean  by  probability.  Most  emphatically,  we  do  not  just  mean  numbers 
lying  between  0  and  1:  it  is  more  interesting  than  that.  There  are 
two  ways  of  responding  to  a  question  about  the  meaning  of  probability. 

One  is  to  describe  the  concept  mathematically.  The  other  is  to  consider 
its  interpretation  in  the  physical  world.  We  consider  both  these 
responses. 

Mathematically,  probability  is  a  function  of  two  arguments:  the 
event  A  about  which  you  are  uncertain,  and  your  knowledge  H  when 
you  make  the  uncertainty  statement.  We  write  p(a|h)  ;  read,  the  prob¬ 
ability  of  A  ,  given  H  .  The  function  obeys  the  three  rules: 

Convexity  0  <  p(A|h)  <  1  and  p(a|h)  »  1  if  H  is  known  by 
you  logically  to  imply  A  . 

Addition  p(A^\/  A2IH)  -  p(A^|H)  +  p(A2|H)  -  p(Aj^  AA^Ih)  . 

Multiplication  p(Aj^  A  A2IH)  -  p(Aj^|H)  p(A2|Aj^  H  H)  . 

We  could  elaborate  on  these  rules:  for  example,  by  discussing 
whether  the  events  have  to  form  a  a-fleld,  whether  the  addition  law 
holds  for  an  enumerable  infinity  of  events,  whether  p(A|H)  ■  1  only 


If  H  is  known  by  you  logically  to  imply  A  ,  and  in  other  ways.  Sut 
these  would  merely  add  mathematical  glosses  to  the  key  ideas  that  prob¬ 
ability  lies  between  0  and  1  and  obeys  two  distinct  rules  of  combination. 
From  these  three  rules,  perhaps  modified  slightly,  all  the  many,  rich 
and  wonderful  results  of  the  probability  calculus  follow.  They  may  be  - 
described  as  the  axioms  of  probability.  We  prefer  not  to  describe 
them  this  way  because,  as  will  be  seen  below,  they  can  be  derived  from 
other,  more  basic,  axioms  and  consequently  appear  as  theorems. 

The  interpretation  of  p(a|h)  Is  that  it  is  your  subjective 
belief  in  the  truth  of  A  were  you  to  know  that  H  was  true.  It  is 
often  referred  to  as  subjective  probability  because  it  is  ascribable 
to  a  subject,  you;  and  also  to  distinguish  it  from  another  use  of 
probability  called  frequentist  or  objective.  This  latter  we  shall  call 
chance,  thus  avoiding  the  adjective  for  probability.  It  is  convenient 
to  think  of  p(a|h)  as  a  measurement:  like  a  measurement  of  length 
or  temperature.  It  measures  belief,  not  temperature.  Like  all  measure¬ 
ments  it  has  a  standard.  We  may  take  the  simple  example  of  balls  in 
an  urn.  For  you,  p(a|h)  “a  if  you  are  indifferent  between  receiving 
a  prize  contingent  on  A  ,  knowing  H  ,  and  receiving  the  same  prize 
contingent  on  a  black  ball  being  drawn  at  random  from  an  um  containing  a 
proportion  a  of  black  balls.  Of  course,  other  ways  are  possible.  It 
is  a  defect  of  many  other  approaches  to  the  measurement  of  uncertainty 
that  they  do  not  have  a  standard  by  which  to  judge  their  statements. 


-  115  - 


4.  THE  USE  OF  SCORING  RULES 

Having  Interpreted  probability  in  two,  important  ways,  let  us 
turn  to  the  defense  of  the  thesis  of  the  inevitability  of  probability. 

The  task  is  to  study  uncertainty,  particularly  in  the  context  of.AI 
and  ES.  As  scientists  and  engineers  we  would  expect  to  measure  our 
object  of  study  ,  to  describe  the  uncertainty  numerically.  If  we  agree 
to  do  this,  we  have  to  decide  what  rules  the  numbers  obey:  for 
example,  can  we  add  them,  like  lengths?  One  way  Is  to  think  of  possi¬ 
ble  rules  and  choose  some  that  seem  reasonable.  This  is  the  method 
of  classical  statistics,  fuzzy  logic  and  belief  functions.  There  Is 
another  method. 

Suppose  that  In  expressing  your  belief  In  A  ,  given  H  ,  you 
provide  a  numerical  value  a  .  In  what  sense  Is  a  a  "good"  measure¬ 
ment  of  your  belief?  De  Flnettl  (1974/5)  had  the  Icea  of  Introducing  a 
score  function,  which  scores  your  measurement  or,  as  we  usually 
prefer,  your  assessment  of  your  uncertainty  of  A  ,  given  H  .  For 
two  function  f^  and  f^^  the  score,  when  a  Is  announced  as  the 
assessment.  Is  defined  to  be: 

f^(a}  If  both  A  and  H  are  true, 
f^Ca)  If  H  Is  true,  but  A  false,  and 
zero  If  H  Is  false. 

2 

De  Flnettl  used  the  quadratic,  or  Brier  score:  fQ(®)  “  ®  “  (1-a 

With  the  quadratic,  a  near  1(0}  will  give  a  low  score  when  A  Is  true 
(false)  and  H  true.  If  H  Is  false  the  statement  about  A  Is  Irrele¬ 
vant  since  It  was  made  on  the  supposition  of  H  . 

-  116  - 


Suppose  now  that  you,  or  the  expert  In  ES,  does  this  with 
several  event  pairs  ;  Is  scored  on  each  and  the  scores  added. 

Then  de  Flnettl  shoved  for  the  quadratic  rule,  that  the  values  a^ 
must  obey  the  rules  of  probability.  Llndley  (1982)  generalized  the 
result  and  showed  that  virtually  any  score  leads  to  probability:  some 
scores  are  eccentric  and  result  In  only  two  possible  values  for  a 
whatever  be  A  and  H  .  A  consequence  of  de  Flnettl 's  result  Is  that 
someone  using  rules  for  the  combination  of  the  a^  that  are  not 
probabilistic — for  example,  those  of  belief  functions — will  have  a 
worse  score,  whatever  be  the  truth  or  falsity  of  the  A's  and  H's  , 
than  the  probablllst.  Notice  how  eminently  practical  this  approach  is 
The  "expertize"  of  an  expert  could  be  assessed  by  keeping  a  check  on 
his  scores.  Of  two  probabilists,  either  one  may  do  better  than  the 
other,  but  both  will  do  better  than  someone  not  using  the  probability 
calculus. 

5 .  AXIOMATIC  APPROACH 

In  an  alternative  approach  we  think  about  the  concept  of 
uncertainty  and  try  to  latch  onto  simple,  basic  principles  that  ought 
to  be  present  in  any  study  of  uncertainty;  such  that  any  violation  of 
a  principle  would,  when  exposed,  make  the  argument  look  ridiculous. 

The  principles,  self-evident  truths,  are  called  axioms  and  from  these 
we  would  hope  to  deduce,  by  mathematical  reasoning,  the  rules  that  the 
numbers  obey.  Euclidean  geometry  is  the  famous  example  of  this  pro¬ 
cedure  when  applied  to  the  measurement  of  length.  This  programme  was 
first  carried  out  for  beliefs  in  1926  by  Ramsey  (1931).  The  best- 


known  example  is  Savage  (1954) .  De  Groot  (1970)  presents  what  Is  per¬ 
haps  the  most  readable  version.  All  these  approaches  lead  to  the  result 
that  the  numbers  must  obey  exactly  the  three  rules  of  probability  above. 
In  other  words,  the  'axioms'  of  probability  have  been  deduced  from  other, 
simpler  Ideas  that  more  legitimately  can,  because  of  their  self- 
evidentiary  nature,  be  called  axioms. 

Let  the  converse  be  emphasized:  any  violation  of  the  rules  must 
correspond  to  some  violation  of  the  basic  axioms,  of  those  rules  whose 
violation  would  look  ridiculous.  We  really  have  no  choice  about  the 
rules  governing  our  measurement  of  uncertainty:  they  are  dictated  to 
us  by  the  inexorable  laws  of  logic.  Of  course,  they  are  entirely 
dependent  on  the  chosen  axioms  and  the  history  of  mathematics  warns  us 
not  to  be  too  complacent  about  the  "sacred"  rightness  of  axioms. 

But  at  the  moment,  the  axioms  are  unassallled  and  all  variants  pro¬ 
duce  minor  variants  in  probability. 


6 .  COHERENCE 

At  this  point  we  should  perhaps  digress  to  discuss  an  important 
aspect  of  the  Ramsey/Savage/de  Finettl  approaches  that  is  often  over¬ 
looked.  The  discussion  will  also  help  to  explain  why  non-probabilistic 
views  have  had  some  success  in  AI  or  ES  even  though  the  ideas  are 
unsound.  The  rules  of  probability  show  how  different  uncertainty  state¬ 
ments  have  to  fit  together.  Thus,  the  multiplication  rule  above, 
refers  to  three  assessments  and  says  that  one  of  them  must  be  the 
product  of  the  other  two.  Instead  of  "fitting  together" 


.v. 


-  118  - 


we  talk  of  coherence.  The  results  Just  described  can  be  stated  as 


shoving  that  coherence  can  only  be  achieved  by  means  of  probability. 

We  may  say  belief  functions  are  Incoherent  (they  do  not  obey  the 
addition  rule) . 

Coherence  is  not  peculiar  to  the  measurement  of  belief.  It 
applies  to  all  measurement:  for  example,  of  length.  If  ABC  is  a 
triangle  with  a  right  angle  at  B,  it  makes  perfectly  good  sense  to  say 
AB  =  2  or  AC  =  4  or  BC  =  3  ,  or  even  to  make  two  of  these  state¬ 
ments  together.  But  make  all  three  together  and  you  are  incoherent, 

2  2  2 

for  Pythagoras  demands  that  AC  «  AB  +  BC  ,  which  is  not  true  of 
the  numbers  given.  Similarly  one  can  say  that  p(Aj^|h)  =  1/2  or 
p(A2|a^  H)  =  2/3  or  p(A^  /)A2|h)  =  1/4  ,  but  one  cannot  make  all 
three  statements  simultaneously.  The  multiplication  law  replaces 
Pythagoras.  It  is  curious  that  coherence  is  strictly  adhered  to  with 
lengths  but  often  ignored  with  beliefs,  reflecting  the  immaturity  of 
belief  measurement. 

And  that  explains  why  non-probabilistic  procedures  can  sometimes 
appear  sensible.  The  adherents  never  make  enough  statements  for 
coherence  to  be  tested.  They  only  tell  us  the  equivalent  of  AB  =  2 
and  AC  =  4  ,  never  discussing  BC  ,  for  to  do  so  would  reveal  the 
unsound  nature  of  the  argument. 


7.  BAYES  THEOREM 

One  example  of  coherence  is  so  important  in  AI  and  ES  that  we 


should  perhaps  consider  it  now.  Interchanging  A  and  A_  in  the 


above  statement  of  the  multiplication  law  and  recognizing  that 
A  A2  -  A2  n  A^  ,  we  immediately  have  that 

p(A^lH)p(A2|A^  n  H)  -  p(A2|H)p(AjA2  AH)  . 

Using  the  equivalent  result  but  with  A2  ,  replacing  A2  ,  we  have 

P(A2(Aj^  a  H)  p(A^1A2  a  H)  p(A2|h) 

'  ■  ■  '  —  m  ■  -  "I  • 

P(A2|a^  a  H)  p(Aj^IA2  ^  P(A2|H) 

This  is  Bayes  theorem  in  odds  form.  (The  odds  (on)  A  are  simply  the 
ratio  t  of  p(A)  to  p(A)  :  the  odds  against  are  the  inverse  of  this. 

In  practice  they  are  usually  quoted  as  t-1  on  or  t-1  against  with  t  ^  1)  . 

To  appreciate  what  it  says,  temporarily  omit  H  from  the  notation  and  language, 
recognizing  that  it  is  present  in  every  conditioning  event  in  the  statement  of 
the  theorem.  Then  the  result  is  that  the  odds,  p(A2)/p\*-2)  >  of  A2  are 
changed,  due  to  the  additional  knowledge  of  Aj^  ,  into  p(A2 1 A^) /p(A2  1 A^)  by 
multiplying  by  p(Aj^  |  A2) /p(A^  |  A2)  .  Hie  multiplier  is  called  the  likelihood 
ratio.  It  is  the  ratio  of  the  probabilities  of  the  additional  knowledge  Aj^  , 
given  and  then  given  A2  •  Thus  an  AI  system  faced  with  un¬ 

certainty  about  A2  and  experiencing  A^^  has  to  update  its  uncertainty 
by  considering  how  probable  what  it  has  experienced  is,  both  on  the 
supposition  that  is  true,  and  that  A2  is  false.  Any  other  pro¬ 

cedure  is  incoherent.  Most  intelligent  behavior  is  simply  obeying 
Bayes  theorem.  A  high  level  of  intelligence  consists  in  recognizing  a 
new  pattern.  This  is  not  allowed  for  in  Bayes  theorem,  nor  in  any 
other  paradigm  known  to  me.  The  simple  AI  systems  that  we  have  at  the 


moment  must  be  Bayesian. 


8.  A  CHALLENGE 


Let  us  summarize  where  we  have  got  to  in  the  argument.  On  the 
basis  of  simple.  Intuitive  rules;  or  using  a  technique  of  scoring 
statements  of  uncertainty;  It  follows  that  probability  Is  the  only  way 
of  handling  uncertainty.  In  particular  other  ways  are  unsound  and 
essentially  ad  hoc  In  that  they  lack  an  axiomatic  basis. 

There  Is  however  more  than  Just  the  Inevitability  of  probability. 
There  Is  the  consideration  that  probability  is  totally  adequate  for 
all  uncertain  situations  so  far  encountered.  This  is  often  denied. 

The  following  statements  are  taken  from  Zadeh  (1983) . 

"A  serious  shortcoming  of  [probability-based]  methods  is  that 
they  are  not  capable  of  coming  to  grips  with  the  pervasive  fuzziness 
of  information  in  the  knowledge  base,  and,  as  a  result,  are  mostly  ad 
hoc  in  nature." 

"The  validity  of  [Bayes  rule]  is  open  to  question  since  most  of 
the  information  in  the  knowledge  base  of  a  typical  expert  system 
consists  of  a  collection  of  fuzzy  rather  than  nonfuzzy  propositions." 

Shafer  (1982)  says,  in  comparing  belief  functions  and  Bayesian 
methods,  "The  theory  of  belief  functions  offers  an  approach  that  better 
respects  the  realities  and  limitations  of  our  knowledge  and  evidence." 

I  offer  a  challenge  to  these  writers  and  to  all  who  espouse 
non-probabilistlc  methods  for  the  study  of  uncertainty:  the  challenge 
is  that  anything  that  can  be  done  by  these  methods  can  be  better  done 
with  probability.  I  think  this  is  a  fair  challenge.  It  is  a  require¬ 
ment  thet  the  method  has  been  used  and  Is  not  just  a  topic  for 


theorizing,  which  rules  out  some  speculations  in  the  alternative  para¬ 
digms.  If  the  challenge  falls  then  we  shall  really  have  advanced:  for 
an  inadequacy  in  probability  will  have  been  exposed  and  the  need  for 
an  alternative  justified.  The  challenge  is  in  the  spirit  of  Popper 
who  partly  judges  the  merit  of  a  theory  on  its  capability  of  being 
destroyed;  for  the  rich  calculus  of  probability  leads  to  many  testable 
conclusions.  It  is  also  relevant  to  Popperlan  ideas  because  he  has 
discussed  certain  inadequacies  in  probability.  These  have  been  dis¬ 
posed  of  by  Jeffreys  (1961) . 

As  these  words  are  being  written  it  is  impossible  to  know  what 
challenges  might  arise.  All  that  can  be  done  is  to  take  material 
already  in  the  literature  and  examine  that.  I  begin  with  fuzzy  ideas. 

9.  PROBABILITY  IN  PLACE  OF  FUZZINESS 

As  an  example  of  a  fuzzy  proposition  Zadeh  (1983)  cites 
"Berkeley's  population  is  over  100,000" 

He  says  it  is  fuzzy  because  "of  an  implicit  understanding  that  over 
100,000  means  over  100,000  but  not  much  over  100,000"  (his  italics) . 

(He  might  also  have  added  that  Berkeley  is  fuzzy.  Does  it  refer  to  the 
town  in  Gloucestershire  or  that  in  California?  And  population:  does 
it  merely  refer  to  permanent  residents  or  are  students  Included? 

These  are  not  jibes:  my  point  is  that  nearly  all  statements  are 
imprecise. ) 

The  proballllstlc  approach  would  be  to  give  a  probabilistic 
statement  about  a  quantity  that  can  be  evaluated.  The  qualification  is 


-  122  - 


Is  important,  de  Flnettl  has  emphasized.  As  far  as  possible  all  prob¬ 
abilities  should  refer  to  propositions  or  events  that  can  realistically 
be  tested  for  truth  or  falsity.  This  is  because  we  want  to  use  them. 

It  may  be  necessary  to  introduce  other  propositions  but  only  as  aids 
to  the  calculation  of  testable  ones.  (In  statistics  parameters  are  used 
for  this  purpose.  An  example  in  section  14  will  use  guilt  of  a  sus¬ 
pect.)  A  possible  quantity  to  discuss  in  the  fuzzy  statement  is  the 
answer  the  relevant  city  official  In  Berkeley  would  give  when  asked 
for  the  population  of  Berkeley.  If  this  Is  denoted  X  ,  then  the 
proballstlc  statement  corresponding  to  that  quoted  Is  p(x|H)  where 
H  Is  the  knowledge  possessed  by  the  maker  of  the  statement.  It 
would  have  a  mode  a  little  over  100,000  if  the  statement  is  in  H  . 

It  Is  Important  notice  that  in  applications  it  may  not  be 

necessary  to  specify  the  full  probability  distribution  p(x|H)  . 

For  example,  it  may  be  enough  to  quote  its  mean,  the  expectation  of 

X  given  H  ;  what  de  Finetti  calls  the  prevision  of  X  given  H  . 

More  sophistication  may  require  the  variance  of  X  ,  or  equivalently, 

2 

the  prevision  of  X  given  H  .  Fractiles  of  X  are  another  possi¬ 
bility. 

All  fuzzy  propositions  of  this  type  can  be  interpreted  prob¬ 
abilistically  in  a  manner  similar  to  our  treatment  of  Berkeley. 

"Henry  is  young"  needs  a  little  care.  It  clearly  refers  to  Henry 
(whom  I  take  to  be  a  well-defined  person)  and  an  uncertain  quantity 
X  ,  his  age.  But  the  description  is  very  vague.  Made  on  campus, 

Henry  might  be  only  19:  made  at  a  faculty  dinner  Henry  might  be  30: 
made  in  a  home  for  senior  citizens,  he  might  be  65.  Consequently 


H  Is  very  relevant  to  this  result.  Without  context  p(X|H)  will  need 
to  be  appreciable  even  for  X  ~  65  . 

10.  NUMERICAL  EXPRESSION  OF  FUZZINESS 
Another  example  Is  both  more  serious  and  more  elaborate. 

"John  has  duodenal  ulcer  (CF-0.3)" 

(CF  is  an  abbreviation  for  certainty  factor.)  It  is  a  well-known 
feature  of  medical  studies  that  many  concepts  are  imprecisely  defined 
and  that  a  difficulty  in  using  medical  records  resides  in  the  varied 
use  different  doctors  make  of  the  same  term.  Nevertheless  doctors 
find  it  useful  to  identify  features  like  'duodenal  ulcer*.  The 
situation  can  be  described  probabilistically  by  introducting  A  ,  an 
ill-defined  but  supposedly  real  ailment,  duodenal  ulcer,  and  also 
the  appreciation  of  duodenal  ulcer  by  doctor  i  .  The  fuzziness 
of  the  concept  can  be  captured  by  considering  p(D^lA)  and  p(D^|A)  , 
the  probability  that  doctor  i  will  say  John  has  duodenal  ulcer 
both  when  John  has,  and  does  not  have,  true  duodenal  ulcer.  (Useful 
comparison  can  be  made  with  Bayes  theorem  above:  A  replaces  A2  , 
replaces  A^  and  H  is  omitted  from  the  present  notation.) 

Notice  that  A  may  not  be  a  testable  quantity.  It  is  introduced  as 
a  parameter  to  facilitate  the  calculation  of  quantities  that  are 
testable.  For  example,  if  the  above  statement  is  made  by  a  first  doc¬ 
tor,  what  is  the  probability  that  a  second  will  agree?  pCD^ju^)  can 
be  evaluated  by  extending  the  conversation  to  Include  A  .  For 
example,  the  might  be  independent,  given  A  . 


This  second  fuzzy  statement  introduces  a  numerical  measure  in 


the  form  of  a  certainty  factor,  here  0.3.  This  contrasts  with  the 
apparently  similar  numerical  assertion  that  the  probability  (on  an 
undefined  H)  that  John  has  a  duodenal  ulcer  is  0.3  in  at  least  two  ways. 
First,  CF's  combine  by  rules  chat  are  different  from  those  of  the 
probability  calculus,  so  chat  they  would  Inevitability  produce  worse 
scores  in  an  adequate  test  chan  would  probabilities.  Furthermore,  these 
rules  have  no  axiomatic  basis  and  are  merely  Inventions  of  fertile, 
unconstrained  minds.  The  second  difference  between  CF's  and  prob¬ 
abilities  is  that  Che  operational  meaning  of  the  latter  is  clear 
whereas  Chat  of  Che  former  is  not.  We  may  say  that  probabilities 
have  standards,  possibilities  do  not.  One  standard  for  probability 
was  mentioned  above:  balls  in  an  urn.  But  expectation  of  benefit  or  a 
uniform  distribution  may  replace  these.  All  measurement  requires  a 
standard  and  CF's  are  dubious  because  they  do  not  have  them.  What 
does  CF  “  0.3  mean? 

The  literature  on  fuzzy  logic  is  vast,  complicated  and  somewhat 
obscure.  I  have  surely  missed  some  examples  that  it  would  be  useful 
to  test  against  the  challenge  which  remains:  anything  fuzzy  logic  can 
do,  probability  can  do  better. 

11.  INCOHERENCE  AND  BELIEF  FUNCTIONS 

We  next  turn  from  fuzzy  logic  to  belief  functions.  I  have 
already  considered  a  good  example  of  Shafer's  (1982)  in  the  discussion 
to  that  paper.  It  is  repeated  here  partly  because  to  do  so  is  simpler 
for  me  than  to  take  another  one;  and  also  because  it  is  then  possible 

to  respond  to  Shafer's  reaction  to  my  probabilistic  argument.  Before 

-  125  - 


■  I  .'V 


•>:-s 

.-•’.‘As* 


( 


giving  this  It  might  be  useful  to  exhibit  Incoherence  In  the  use  of 
belief  functions.  (The  argument  also  applies  to  fuzzy  methods.) 

We  follow  Shafer  and  write  BEL(A)  for  the  belief  In  A  ,  omitting 
reference  to  the  conditioning  event.  Mow  It  Is  possible  that 

BEL (A)  +  BEL (A)  <  1 

(similarly  for  certainty  factors).  Write  BEL(A)  ■«  a  ,  BEL(A)  ■  b 
so  that  a  b  <  1  .  (Necessarily  a,b  ^  0)  Let  us  score  such  a  belief 
using  the  quadratic  rule.  The  possible  scores  are: 

A  true  (a-1)^  +  b^ 

A  true  a^  +  (b-1)^  . 

Now  replace  a  by  a'  ,  b  by  b*  where  a'=a+e,  b'»b+e 
and  e  ”  %(l-a-b)  .  It  easily  follows  that  a'  +  b'  «  1  and  that  both 

(a’-l)^  +  b'^  <  (a-1)^  +  b^ 

and 

2  2  2  2 
a'  +  (b'-l)  <  a  +  (b-1)  . 

Consequently  it  is  certain  (irrespective  of  whether  A  or  A  is  true) 
that  beliefs  a  and  b  will  score  worse  than  probabilities  a*  and 
b’  ,  adding  to  one.  The  result  generalizes  with  any  score. 

12.  PROBABILITY  IN  PLACE  OF  BELIEF  FUNCTIONS 
Now  for  Shafer's  example.  Imagine  a  disorder  called  "ploxoma", 
which  comprises  two  distinct  "diseases":  6^  •  "virulent  ploxoma", 
which  is  invariably  fatal,  and  •  "ordinary  ploxoma",  which  varies 
In  severity  and  can  be  treated.  Virulent  ploxoma  can  be  Identified 
unequivocally  at  the  time  of  a  victim's  death,  but  the  only  way  to 


distinguish  between  the  two  diseases  in  their  early  stages  seems  to  be 
a  blood  test  with  three  possible  outcomes,  labelled  and  x^  . 

The  following  evidence  is  available:  (i)  Blood  tests  of  a  large 
number  of  patients  dying  of  virulent  ploxoma  showed  the  outcomes 
x^  ,  x^  and  x^  occurring  20,  20  and  60  per  cent  of  the  time, 
respectively.  (11)  A  study  of  patients  whose  ploxoma  had  continued 
so  long  as  to  be  almost  certainly  ordinary  ploxoma  showed  outcome  x^ 
to  occur  85  per  cent  of  the  time  and  outcomes  ^2  and  x^  to  occur 
15  per  cent  of  the  time.  (The  study  was  made  before  methods  for 
distinguishing  between  and  x^  were  perfected.)  There  is  some 

question  whether  the  patients  in  the  study  represent  a  fair  sample  of 
the  population  of  ordinary  ploxoma  victims,  but  experts  feel  fairly 
confident  (say  75  per  cent)  that  the  criteria  by  which  patients 
were  selected  for  the  study  should  not  affect  the  distribution  of  test 
outcomes,  (iii)  It  seems  that  most  people  who  seek  medical  help  for 
ploxoma  are  suffering  from  ordinary  ploxoma.  There  have  been  no  care¬ 
ful  statistical  studies,  but  physicians  are  convinced  that  only  5-15 
per  cent  of  ploxoma  patients  suffer  from  virulent  ploxoma. 

My  reply  was  as  follows.  The  first  piece  of  evidence  (i)  estab¬ 
lishes  in  the  usual  way  that  the  chances  fora  person  with  virulent 
ploxoma  to  have  blood-test  results  of  types  x^^  ,  ^2  and  x^  are 

0*2,  0*2  and  0*6  .  The  second  (ii)  is  subtler  for  two  reasons: 

x^  and  x^  are  not  distinguished  in  the  data,  and  the  patients  in  the 
study  are  not  judged  exchangeable  with  other  patients  so  that  the 
chances  $  in  the  study  and  y  for  the  new  patients  are  not  necessarily 
equal.  The  first  presents  no  difficulty  since  the  likelihood  for  the 


data  Is  where  r  ■  0»85n  and  n  Is  the  number  of  pa¬ 

tients  in  the  study.  The  distribution  of  3  given  the  data  can  there¬ 
fore  be  found.  Let  p(y|B)  be  the  conditional  distribution  of  Y  • 
given  3  .  This  concept  replaces  the  single  figure  of  75  per  cent 

quoted  by  Shafer  and  which  yields  a  discount  rate  of  a  ■>  0.25  .  It 

would  be  possible  to  suppose  Y  ^  B  with  probability  0*75  and  is 
otherwise  uniform  in  the  unit  Interval  in  imitation  of  belief  functions; 
but  this  may  be  an  unrealistic  description  of  the  situation.  The  third 
piece  of  evidence  (ill)  says  the  distribution  of  the  chance  6  that 
a  patient  has  virulent  ploxoma,  p(0)  >  is  essentially  confined  to 
the  range  (0*05.  0*15)  .  We  are  now  ready  to  perform  the  requisite 
probability  calculations. 

Let  G  be  the  event  that  a  new  patient,  George,  has  virulent 
ploxoma  and  let  g^  be  the  result  of  his  blood  test.  We  require 

p(G|g^,E)  where  E  is  the  evidence.  From  (iii)  p(G)  *  /0p(9)d9  . 

From  (i)  p(g^|G,E)  =  0*2  for  i  =  1,2  and  0*6  for  i  =  3  .  From  (ii) 

p(g.|G,E)  =  //Y^p(Y|3)p(BlE)dBdY 
-  /E(Y^|3)p(B|E)d3 

and  the  calculations  can  be  completed  in  the  usual  way  using  Bayes' 
theorem.  If  E(9)  =  0*10  ,  E(y^|6)  =  6^^  and  E(B2|Bj;)  =  %(l-3j^)  then 

the  probabilities  of  G  given  g^  are  respectively  0*025  ,  0*229 

and  0*471  . 

It  may  be  objected  that  this  analysis  virtually  Ignores  the 
uncertainty  about  the  study  and  about  0  .  It  does  so  because  they  are 
Irrelevant.  The  Interested  reader  may  like  to  consider  the  case  of 


George  and  Henry  and  their  blood  tests.  Then  the  uncertainties  will 

2 

matter:  for  example,  E(y^|@)  i  involving  the  conditional  variance  of 
f  will  arise. 

Shafer  in  response  says  that  "Lindley  insists  that  the  uncer¬ 
tainties  affecting  this  study  are  irrelevant  and  should  be  Ignored. 

Is  this  reasonable?  Suppose  that  instead  of  having  only  75%  confidence 
in  the  study  we  have  much  less  confidence.  Is  there  not  some  point 
where  even  Lindley  would  chuck  out  the  study  and  revert  to  the  prior 
5-15%?"  My  reply  is  that  Shafer  is  correct  and  that  the  uncertainty 
does  matter  a  little,  for  it  affects  E(Y|6)  .  Were  we  to  have  no 
confidence  at  all  in  the  study  then  E(Y]6)  would  not  depend  on  6  , 
and  p(g^|G,E)  would  be  simply  E(Y^)  about  which  no  information  is 
given.  (The  prior  on  9  seems  irrelevant). 

Consequently  1  feel  that  the  challenge  has  been  well  met  with 
the  example  and,  by  a  Popperian  argument,  the  credibility  of  prob¬ 
ability  theory  is  increased. 

13.  COMPLEXITY,  COVERAGE,  DECISIONS  AND  RICHNESS 

Here  are  four  miscellaneous  remarks. 

(1)  It  should  be  noted  that  fuzzy  logic  and  belief  functions 
are  considerably  more  complicated  concepts  than  those  of  probability. 
With  belief  functions  we  start  effectively  with  probabilities  over  the 
power  set  of  the  original  events.  Itself  much  more  complicated  than 
the  original  set,  and  then  have  to  elaborate  on  that.  Dempster's  rule 
of  combination  is  vastly  more  involved  than  Bayes  and  then  only  applies 
in  certain  cases.  Fuzzy  logic  leads  to  non-linear  programming  and 

-  129  - 


contains  great  complexities  of  language  and  ideas.  Yet  probability  is 
extremely  simple,  using  only  three  rules  and  containing  rich  concepts 
like  Independence  and  expectation. 

Certainly  if  my  challenge  fails  it  will  be  necessary  to  intro¬ 
duce  some  change  into  probability  ideas,  which  will  almost  surely 
Increase  the  complexity,  yet  be  necessary  and  rewarding.  But  until  that 
happens  is  it  not  best  to  accept  the  advice  of  William  of  Ockham  and 
not  multiply  entitles  beyond  necessity? 

(2)  It  is  not  implied  in  the  challenge  that  probability  can 
handle  every  problem  involving  uncertainty:  the  claim  is  merely  that 
probability  can  do  better  than  the  alternatives.  I  believe  that  it  has 
the  potentiality  to  solve  every  uncertain  situation  but  there  are  some 
for  which  the  available  techniques  are  inadequate.  It  is  absurd  to 
think  that  any  paradigm  can  quickly  resolve  every  relevant  puzzle;  some 
may  resist  solution  for  decades.  For  example,  the  medical  problem  of 
handling  large  numbers  of  indicants  in  diagnosis  is  currently  unresolved 
because  we  do  not  have  adequate  techniques  for  handling  the  complicated 
dependencies  that  exist.  (And  certainly  belief  functions  do  not.)  We 
need  more  research  into  applied  probability  and  less  into  fancy  alter¬ 
natives. 

(3)  Why  do  we  want  to  study  uncertainty?  Aside  from  the 
intellectual  pleasure  it  can  provide,  there  is  only  one  answer:  to  be 
able  to  make  decisions  in  the  face  of  uncertainty.  Studies  that  do  not 
have  the  potentiality  for  practical  use  in  decision-making  are  seriously 
inadequate.  An  axiomatic  treatment  of  decision-making  shows  (Savage  (1954), 


De  Groot  (1970))  that  maximization  of  expected  utility  is  the  only  satis¬ 
factory  procedure.  This  uses,  in  the  expectation  calculation,  the  prob¬ 
abilities  and  these,  and  only  these,  are  exactly  the  quantities  need  for 
coherent  decision-making  by  a  single  decision-maker.  Only  the  utilities, 
dependent  on  the  consequences,  not  on  the  uncertainties,  need  to  be  added 
to  make  a  rational  choice  of  action.  How  can  one  use  fuzzy  logic  or 
belief  functions  to  decide?  Indeed,  consider  a  case  where  BEL(A)  + 

BEL(A)  <  1  .  Because  you  have  so  little  belief  in  either  outcome  do  you, 
like  Buradin's  ass,  starve  to  death  in  your  indecision  between  A  and 
its  negation?  Reality  demands  probability. 

(4)  It  is  sometimes  said,  as  in  the  quotes  from  Zadeh  above, 
that  probability  is  inadequate.  This  sense  of  inadequacy  sometimes  arises 
because  people  only  think  of  probability  as  a  value  between  0  and  1, 
forgetting  the  whole  concept  of  coherence  and,  in  particular,  ignoring 
the  addition  and  multiplication  laws.  In  fact  probability  is  a  rich  and 
subtle  concept  capable  of  dealing  with  beautifully  delicate  and  important 
problems.  This  richness  is  hard  to  convey  without  deep  iimnersion  in 
the  topic.  In  order  to  display  this,  and  also  to  try  to  avoid  the 
impression  that  this  paper  is  entirely  concerned  with  bashing  other 
ideas,  I  conclude  by  discussing  a  situation  that  arises  in  forensic  science 
or  criminalistics.  It  has  been  much  discussed  in  the  literature;  a 
convenient  reference  is  Eggleston  (1983).  An  almost  identical  problem 
lias  been  considered  by  Diaconis  and  Zabell  (1982)  using  Jeffrey's  rule. 

For  reasons  given  below,  I  think  their  treatment  is  unsatisfactory. 


14.  A  PROBABILITY  EXAMPLE 

A  crime  has  been  committed  by  a  person  who  Is  to  be  found  amongst 
a  population  of  (n+1)  people.  One  of  these  is  referred  to  as  the  sus¬ 
pect,  the  others  are  labelled  in  a  non- informative  way  from  1  to  n  .  Let 
be  the  event  that  the  suspect  is  guilty,  that  person  i  is 

(Ki<n)  .  Initially  P(Gg)  “  tt  ,  p(G^)  =  (l-TT)/n  for  all  i  .  (Some 

forms  of  the  problem  have  tt  =  (n+1)  ^  ,  which  probabilistically  does  not 
distinguish  the  suspect  from  the  other  n.) 

An  investigator  studying  the  crime  says  "the  evidence  suggests 
the  criminal  is  left-handed."  This  is  a  fuzzy  statement  and  its  prob- 
alistic  interpretation  requires  care.  After  discussion  the  investigator 
says  that  the  probability  that  the  criminal  is  left-handed  is  P  .  This 
is  still  ambiguous.  Diaconis  and  Zabell  appear  to  interpret  it  to  mean 
that  the  probability  that  the  criminal  will  be  found  amongst  the  left¬ 
handers  in  the  group  of  (n+1)  is  P  .  1  think  a  British  forensic 
scientist  would  mean  that  if  he  had  the  criminal  in  front  of  him,  the 
probability  that  he  would  be  found  to  be  left-handed  is  P  .  The  former 
is  the  chance  of  guilt  amongst  left-handers:  the  latter  of  left- 
handedness  amongst  the  guilty.  Also  the  former  requires  reference  to 
the  population:  the  latter  does  not.  Typical  forensic  evidence  makes  no 
mention  of  a  population,  only  of  the  criminal,  and  so  the  latter  inter¬ 
pretation  is  appropriate.  There  is  a  confusion  between  p(A|B)  and 

p(b|a)  . 

Working  with  the  forensic  interpretation,  the  formal  statement  is 
p()Z.i|Gi)  =  P  ,  where  denotes  the  event  that  person  i  is  left- 


132  - 


j’  -w" -w"  '  ;r; 


T» 


and 

p(G  [£  2.,)  a  p^(l-Tr)/n  for  2  <  i  <  n  . 

Thus  P(Gg|2.g2.3^)  *  P7r/{PTT  +  P(l-7r)/n  +  p(1-it)  (n-l)/n}  .  (2) 

Rearranging  the  denominator  as  Ptt  +  p(l-iT)  +  (P-p)  (1-tt) /n  we  see  that 
(2)  is  less  than  (1):  the  knowledge  of  another  left-handed  in  the  popu¬ 
lation  has  slightly  decreased  the  probability  that  S  is  guilty.  Notice 
that  when  n  =  1  ,  p  (G^  1 =  tt  :  the  evidence  that  all  the  population 
is  left-handed  has  not  changed  the  suspect's  probability  for  guilt  at  all. 

Evidence  .  There  are  no  left-handers  amongst  the  n  people. 
Combined  with  E^  this  means  that  the  suspect  is  the  only  left¬ 
hander.  Denoting  E^  by  2^  ,  a  use  of  Bayes  theorem  similar  to  that 
employed  with  E^  and  E2  gives 

p(Gs|2^2o)  a  P(Vo!‘^s^P^^s^  '  P(l-p)''Tf 

and 

p(gJ2^2q)  a  p(2g2Q|G.)p(G.)  =  p(l-p)""^(l-P)  (l-Tr)/n  . 

Hence 

p(G^|2s2o)  =  PTr/{PTT  +  p(l-TT)(l-P)/(l-p)}  .  (3) 

This  clearly  exceeds  p(G^|2^)  ,  equation  (1),  if  P  >  p  ,  showing  that 
E^  increases  the  probability  that  the  suspect  is  guilty.  Indeed,  if 
P  =  1  ,  (3)  gives  1  as  it  should. 

Evidence  E^  .  There  is  at  least  one  left-hander  amongst  the  n 

people. 

E^  is  the  negation  of  E^  and  may  be  written  2^  .  It  differs 
from  E2  in  that  the  latter  names  a  specific  left-hander,  #1.  We  have 


-  134  - 


handed  (l^l^n  and  1=S)  .  It  was  emphasized  in  the  discussion  of  Bayes 


theorem  that  It  Is  essential  to  consider  the  evidence  both  on 

and  on  A2  .  So  here  we  need,  in  addition  to  p(f,^|G^)  ,  . 

The  latter  Is  the  chance  that  anyone  is  left-handed  and  may  ordinarily 
be  equated  to  the  frequency  of  left-handedness  In  the  population,  p 
say.  So  p(£^|g^)  =  p  for  all  1  ,  including  S  .  Presumably  P  >  p  . 
(In  some  forms  of  the  problem  P=1  and  the  forensic  evidence  is  firm. 

This  can  realistically  arise  when  dealing  with  blood  types  that  can  be 
identified  without  error.)  Diaconis  and  Zabell  do  not  consider  p  . 

This  seems  strange  because  the  presence  of  an  unusual  trait  intuitively 
carries  more  weight  than  a  common  one.  The  formal  analysis  below  will 
confirm  this. 


15.  THE  ROLE  OF  ADDITIONAL  EVIDENCE 
Now  consider  various  forms  of  additional  evidence. 

Evidence  E^  .  The  suspect  is  found  to  be  left-handed.  In  the 
notation  this  is  the  event  .  Simple  use  of  Bayes  theorem 

p(Gjy  -  p(iljG^)p(G^)/p(£^) 

yields 

p(G  k  )  =  P7r/{PiT  +  p(1-it)}  (1) 

s  s 

which  clearly  exceeds  tt  .  is  indicative  of  the  suspect's  guilt. 

Evidence  E2  .  Person  //I  is  left-handed.  This  is  .  Now 
with  both  Ej^  and  E2 

p(GsUg£i)  oc  p(£g£j^|G)p(G)  -  PpiT  . 


Similarly 


and 


P<Vol‘^i^  -  -  P<Vol^i^  '  P  ■  p(i-p)““^i-P)  • 

A  further  use  of  Bayes  theorem  gives 

PIT  -  P(l-p)"Tr 

- - -  .  (, 

ptr  +  pd-TT)  -  (l-p)"{PTT  +  p(l-Tr)(l-P)/(l-p)} 

If  n  «  1  this  give  tt  in  agreement  with  p(G  U  £, )  ,  equation  (2). 

s  si 

It  is  easy  to  see  that  p(G  ]£  £_)  <  p(G  ]£  )  ,  equation  (1),  so  that 

s  s  u  s  s 

slightly  decreases  the  probability  of  the  suspect's  guilt. 

Now  for  a  subtlety:  compare  (2)  and  (4),  that  is  the  probability 
that  the  suspect  is  guilty  given,  in  (2),  the  name  of  a  left-hander  and 
in  (4)  the  mere  presence  of  a  left-hander.  These  are  different.  It  is 
not  too  hard  to  verify  by  induction  on  n  that 

pw^lVi)  <  l>«=slVo> 

for  n  >  1  ,  so  that  the  definitive  knowledge  of  //I's  left-handedness 
reduces  the  suspect's  guilt  probability  by  more  than  does  the  mere 
evidence  of  someone's  left-handedness. 

I  leave  the  reader  to  think  out  whether  the  following  argument 
is  correct.  Knowing  there  is  a  left-hander  in  the  n(E^)  ,  no  infor¬ 
mation  about  the  suspect's  guilt  can  possibly  be  provided  by  telling  me 
the  number  of  one  of  them.  Accepting  this,  you  are  told  it  is  #1. 

Since  (2)  and  (4)  differ  (and  calling  itl  Smith  for  dramatic  effect)  the 
evidences  "Smith  is  left-handed"  and  "There  are  left-handers,  one  of 
whom  is  called  Smith"  have  different  evidential  value. 


-  135  - 


16.  CONCLUSION 


Our  argument  may  be  summarized  by  saying  that  probability  Is 
the  only  sensible  description  of  uncertainty  and  Is  adequate  for  all 
problems  Involving  uncertainty.  All  other  methods  are  Inadequate.  The 
justification  for  the  position  rests  on  the  formal,  axiomatic  argument 
that  leads  to  the  inevitability  of  probability  as  a  theorem  and  also  on 
the  pragmatic  verification  that  probability  does  work.  My  challenge 
that  anything  that  can  be  done  with  fuzzy  logic,  belief  functions, 
upper  and  lower  probabilities,  or  any  other  alternative  to  probability, 
can  better  be  done  with  probability,  remains. 


REFERENCES 


De  Flnetti,  B.  (1974/5).  Theory  of  Probability  (2  vols.}.  New  York: 

Wiley. 

DeGroot,  M.  H.  (1970).  Optimal  Statistical  Decisions.  New  York:  McGraw- 
Hill. 

Diaconis,  P.  and  Zabell,  S.  L.  (1982).  Updating  subjective  probability. 

J.  Amer.  Statist.  Ass.  77,  822-830. 

Eggleston,  R.  (1983).  Evidence,  Proof  and  Probability.  London: 

Weidenfeld  and  Nicolson. 

Fine,  T.  L.  (1973).  Theories  of  Probability:  An  Examination  of  Foundations. 
New  York:  Academic  Press. 

Jeffrey,  R.  (1965).  The  Logic  of  Decision.  New  York:  McGraw-Hill. 

Jeffreys,  H.  (1961).  Theory  of  Probability.  Oxford:  Clarendon  Press. 

Lindley,  D.  V.  (1982).  Scoring  rules  and  the  inevitability  of  prob¬ 
ability.  Int.  Statist.  Rev.  50,  1-26  (with  discussion). 

Ramsey,  F.  P.  (1931).  Truth  and  probability  (In  The  Foundations  of  Mathemati 
and  Other  Essays.  London:  Kegan,  Paul,  Trench,  Trubner,  156-198. 

Shafer,  G.  (1976).  A  Mathematical  Theory  of  Evidence.  Princeton: 

University  Press. 

Shafer,  G.  (1982).  Belief  functions  and  parametric  models.  J .  Roy . 

Statist.  Soc.  44,  322-352  (with  discussion). 


Smith,  C.  A.  B.  (1961).  Consistency  In  statistical  Inference  and  decision 


J.  Roy.  Statist.  Soc.  B  23,  1-37  (with  discussion). 

Zadeh,  L.  A.  (1983).  The  role  of  fuzzy  logic  in  the  management  of 
uncertainty  in  expert  systems.  Fuzzy  Sets  and  Systems  11,  199-227. 


TRANSCRIPT  OF  ORAL  PRESENTATION  BY  DENNIS  LINDLEY: 
PROBABILITY  CALCULUS 
FOR  THE  TREATMENT  OF  UNCERTAINTY 


DR.  LINDLEY:  My  thesis  this  afternoon  is  extremely  simply 
stated,  that  the  only  satisfactory  description  of  uncertainty  is 
probability,  that  if  you  do  it  in  any  other  way  then  in  some  sense  it 
will  be  defective. 

We  had  better  start,  I  think,  by  getting  clear  what  I  mean  by 
"probability."  There  are  two  ways  of  answering  the  question  "what  is 
probability?" 

The  first  answer  is  within  the  mathematical  framework,  you  can 
say  what  is  the  mathematics  of  the  subject.  The  second  way  to  answer 
the  question  is  to  say  what  it  means  in  the  world. 

Let  me  take  both  approaches.  The  notation  that  I'm  going  to  use 
is  P  of  A  vertical  line  H  to  mean  the  probability  of  an  event  A,  given 
information  H.  (Slide  1) 

The  first  point  I  want  to  make  is  that  probability  is  a  function 
of  two  things:  the  event  A  about  which  you  are  uncertain  and  the 
Information  H  that  you  have  when  you  make  your  statement  of  uncertainty. 

There  is  a  lot  of  nonsense  talked  about  probabilities  as  a 
function  of  one  argument.  That  is  clearly  nonsense  because  if  your 
information  changes,  obviously  your  uncertainty  about  the  situation  can 
change  and  so  consequently  your  measure  of  uncertainty  will  depend  on 
your  information  as  well  as  on  the  event  whose  uncertainty  you're 
considering. 

So  we  have  a  function  of  two  arguments  and  though  I'm  sure 
everybody  in  this  room  knows  them,  I  have  just  written  down  the  three 
basic  laws  of  probability. 

The  first  one,  convexity,  says  that  probability  lies  between  zero 
and  one,  which  of  course  is  all  that  most  lay  people  know  about 
probability,  and  also  that  the  probability  of  A  given  H  is  equal  to  one 
if  you  know  that  H  logically  implies  A.  That  is  down  there  to  make  sure 
you  can  tell  the  difference  between  truth  and  falsity. 

The  next  law  is  the  addition  law  that  says  that  the  probability 
of  either  A^  or  occuring  is  the  probability  of  A^^  plus  the 
probability  of  A2  minus  the  probability  that  they  both  occur.  One  is 
usually  looking  at  that  when  A^  and  A2  are  exclusive  and  so  this  last 
event  cannot  occur,  and  then  we  have  straightforward  addition. 


Finally  there  is  the  nultipllcation  law,  the  probability  that  the 
two  events  both  occur  is  the  probability  of  one  of  them  times  the 
probability  of  the  second  given  that  the  first  is  part  of  your 
information. 

Notice  that  the  multiplication  law  is  the  only  law  in  which  the 
information  changes.  So  it  plays  a  very  central  role  in  probability 
discussion.  One  could  easily  have  omitted  H  from  the  first  two  laws  but 
not  the  third. 

-  Now  many  people  think  of  those  as  axioms,  the  axioms  of 
probability.  One  of  the  points  I  want  to  stress  this  afternoon  is  that 
to  me  they're  not  axioms  at  all,  they  are  theorems.  In  fact,  one  of 

the  most  beautiful  pieces  of  modern  mathematics  that  I  know  is  De 
Flnetti's  proof  of  the  multiplication  law. 

That's  the  mathematical  answer.  If  you  ask  me  what  is 
probability,  I  say  mathematically  It's  anything  that  obeys  those  laws. 
The  next  question  is  what  does  it  mean? 

Now  I  want  to  stress  this  following  point.  It  does  seem  to  me  to 
be  tremendously  Important  and  yet  other  people  somehow  don't  seem  to 
think  it  is. 

We  are  trying  here  to  measure  something.  We  are  trying  to 
measure  uncertainty.  Now  if  you  want  to  measure  anything  in  this  world, 
you  have  to  have  a  standard  of  reference. 

For  example,  if  I  wish  to  measure  this  desk  in  yards,  I  have  to 
do  it  essentially  with  reference  to  a  standard. 

Several  years  ago  there  would  have  been  at  the  National  Bureau  of 
Standards  a  standard  yard.  There  was  a  standard  yard  in  Britain.  There 
was  a  standard  meter  in  Paris.  All  measurements  were  referenced  to  that 
standard. 

If  I  want  to  measure  temperatures  there's  a  standard,  zero 
referring  to  freezing  of  water,  et  cetera. 

Every  measurement  that  you  make  is  with  respect  to  a  standard  and 
here  we  are  trying  to  measure  uncertainty  and  so  what  I  want  to  know  is 
what  is  our  standard. 

Well,  you  can  have  several  standards.  You  can  have  a  standard 
yarc  at  the  National  Bureau  of  Standards,  or  you  can  have  a  standard 
based,  I  believe,  on  the  wavelength  of  sodium  light.  There  are  lots  of 
standards. 

Here  is  a  standard.  We  have  an  urn  in  which  there  is  a 
proportion  a  of  black  balls.  If  now  with  respect  to  any  event  A  and 
information  H  you  say  that  the  probability  of  A  given  H  is  a,  then  the 
standard  is  the  following:  that  you  are  indifferent  between  a  prize 
contingent  on  A,  and  the  same  prize  contingent  on  a  black  ball  being 
drawn  at  random  from  the  urn. 


You  are  going  to  get  a  prize  either  If  A  occurs  or  if  you  get  a 
black  ball  and  If  you  are  indifferent  between  those  two,  then  that  is 
your  probability. 

Kow  that  is  the  interpretation  of  probability  and  one  of  the 
criticisms  that  I  have  to  make  of  other  points  of  view,  for  example, 
fuzzy  logic  and  belief  functions,  is  that  they  do  not  have  a  standard, 
or  if  they  do  1  can't  understand  what  it  is.  They  do  not  have  a 
standard  by  which  to  Judge  things. 

So  there  we  have  probability  both  mathematically  and 
interpretatlvely.  Now  my  thesis,  remember,  is  that  the  only  way  to 
measure  uncertainty  is  by  means  of  probability. 

Let  us  now  take  what  I  think  is  a  really  rather  practical  point 
of  view.  Let  us  imagine  that  we  are  going  to  have  a  series  of  people 
and  they're  going  to  measure  uncertainty  in  any  way  they  choose.  They 
have  agreed  they  are  going  to  use  numbers.  I  don't  care  how  these 
numbers  come  or  how  they  use  them  as  long  as  they  have  some  method  of 
doing  it. 

Now  let  us  imagine  we  watch  these  people  do  this.  They  all 
assign  their  uncertainties  for  various  things  and  we  ask  a  simple 
question.  "How  good  are  they  at  doing  it?" 

For  example,  suppose  we  were  trying  to  measure  lengths  of  tables. 
You  know  each  person  could  measure  the  length  and  put  the  answers 
together.  In  some  way  we  would  get  somebody  down  from  the  National 
Bureau  of  Standards  who  is  really  super  at  measuring  lengths  and  he'd 
measure  and  we  would  compare  them  and  see  how  good  they  were.  A  very 
simple  little  problem. 

How  are  we  going  to  do  that  with  uncertainty?  If  somebody  says 
the  uncertainty  is  .8,  is  that  good  or  is  it  bad? 

Well,  it  clearly  depends  on  whether  the  event  is  subsequently 
seen  to  be  true  or  not. 

If,  for  example,  we  agree  that  the  bigger  the  number  the  more 
likely  the  event  is  to  be  true  in  some  sense,  a  .8  when  the  event  is 
true  is  somehow  better  than  ,2  when  the  event  is  true. 

On  the  other  hand,  if  the  event  is  false,  .2  is  better  than  .8. 

Now  Oe  Flnetti  had,  I  think,  a  brilliant  idea  that  what  we  could 
do  is  score  people.  So  let  me  now  introduce  you  to  the  idea  of  a  score 
function.  (Slide  2) 

Let  the  uncertainty  of  A  given  H  be  described  by  a  number  a.  I 
don't  care  how  you've  got  a,  you  can  do  it  by  fuzzy  logic,  you  could  do 
it  by  belief  functions,  you  can  do  it  by  sampling  theory  statistics,  you 
could  do  it  by  Jeffrey's  rule,  you  can  do  it  in  any  way  you  like. 


-  141  - 


All  I'm  saying  is  let's  suppose  that  you  were  to  assess  the 
uncertainty  of  A  given  H  by  little  a,  and  now  we're  going  to  score  you. 

If  A  and  H  are  both  true,  you  will  get  a  score  which  depends  on 
A,  a  function  of  A.  Let's  put  suffix  1  there  corresponding  to  A  being 
true. 

If  on  the  other  hand  A  is  false,  you  will  get  a  score  f  zero  of 

A. 

If  it  turns  out  that  H  is  false,  there  is  no  score  at  all  because 
your  assessment  of  uncertainty  was  conditional  on  H,  so  if  H  is  true 
you're  not  in  the  game. 

Now  let  us  suppose  then  that  these  people,  however  they  get  these 
numbers  little  a,  are  scored.  What  we're  going  to  do  is  keep  a  tally  on 
their  scores  and  add  up  all  the  scores. 

Now  that  seems  to  me  a  very  practical  way  of  doing  this  sort  of 
thing  and  in  fact  I  understand  that  it's  done  in  meteorology. 

People  make  a  statement  about  the  uncertainty  of  rain  tomorrow 
and  wait  and  see  whether  it  rains  tomorrow  and  give  them  a  score.  This 
is  repeated  over  several  days  and  the  scores  added.  A  good 
meteorologist  gets  a  low  score  and  a  bad  meteorologist  gets  a  high 
score.  (I'm  thinking  of  these  scores  as  penalty  scores.  They  are  bad 
things.  You  want  to  minimize  them.  You  can  turn  yourself  upside  down 
if  you  like  and  make  them  good,  but  my  convention  is  going  to  be  that.) 

The  simplest  score  function  is  the  quadratic  score  function, 
sometimes  called  the  Brier  score  function,  of  a  is  (a  minus  1) 
squared  and  fQ  is  a  squared. 

Suppose  the  event  A  is  a  sunny  day  and  you  give  it  value  .8.  If 
the  event  then  turns  out  to  be  true,  your  score  Is  going  to  be  .8  minus 

1,  that's  .2,  all  squared,  a  little  score  of  .0^. 

On  the  other  hand,  if  the  event  is  false,  you're  going  to  get  .8 
squared,  you're  going  to  get  .6U,  you’re  going  to  get  a  big  penalty 
score,  you've  done  rather  badly. 

So  .8  has  done  rather  well  if  A  Is  true,  and  done  rather  badly  if 

false. 

Then  we're  going  to  take  all  these  scores  and  we're  going  to  add 
them  up.  Now  that  seems  to  me  a  very  sensible  system  of  doing  these 

things.  You  know,  I'd  like  to  take  these  columnists  who  are  making 

forecasts  and  these  other  people  in  different  fields  making  forecasts, 
and  Just  check  them.  I'd  love  to  take  some  sampling  theory 
statisticians  and  Just  see  how  well  they  do  with  their  inferences. 


Now  De  Flnetti  proves  a  nost  remarkable  result.  He  proved  that 
with  this  quadratic  score  function,  those  numbers  a  had  better  obey  the 
rules  of  probability.  Whatever  happens,  whether  the  A  's  there  are  true 
or  false,  you  will  do  better  if  you  make  those  numbers  obey  the  rules  of 
probability.  Those  are  the  three  rules  that  I  had  on  my  first  slide. 

And  so  consequently  De  Pinetti  proved  the  rules  of  probability. 
There  were  theorems  resulting  from  these  assumptions.. 

VOICE;  What  does  it  mean,  "do  better?" 

DR.  LINDLEY:  The  score  will  be  less,  whatever  happens. 

VOICE:  Is  it  the  expectation  of  the  score  _ 

DR.  LINDLEY:  No,  it's  for  sure.  There  is  no  expectation 
involved  in  this.  It's  for  sure.  Whether  the  events  are  true  or  false 
for  sure  is  here. 

Now  you  might  say,  well,  that's  an  interesting  result  but  I  think 
you've  sort  of  cheated  because  you  have  made  this  score  go  near  to  one 
for  truth  and  zero  for  falsity.  I've  really  forced  it  into  being 
probability,  haven't  I?  But  lo  and  behold  I've  not  because  it  turns  out 
almost,  whatever  those  two  functions  f.  and  f  are  the  same  result 
persists.  ® 

Whatever  function  you  take  there  the  numbers  that  you  get  will 
obey  the  laws  of  probability,  at  least  with  a  little  catch. 

Here's  the  catch.  Suppose,  for  example,  these  two  are 
exponentials.  Not  quadratic,  but  exponentials.  The  numbers  that  you 
would  turn  out  to  be  giving  would  be  the  logarithm  of  the  odds  rather 
than  the  probabilities,  so  in  other  words  I  would  have  to  turn  all  those 
probability  rules  into  logodds  rules,  which  could  be  done.  They'd  look 
Just  a  bit  messy  in  log-log  form,  but  that's  what  would  happen. 

What  happens  is  if  you  take  almost  any  score  function,  the  person 
will  give  you  a  known  transform  of  probabilities.  What  transform  it  is 
depends  on  f^  and  f^. 

Now  there  are  some  strange  score  functions  that  don't  do  this. 
There  are  some  strange  score  functions  that  would  lead  you  always  to 
give  one  of  two  answers,  say  zero  or  one.  There  are  some  score 
functions  that  push  you  into  giving  one  of  two  numbers,  always  zero  or 
always  one,  never  anything  else,  but  those  are  very  strange  score 
functions  and  one  doesn't  want  to  use  those.  It's  like  making  everyone 
say  true  or  false  in  response  to  a  question.  That  would  be  silly. 

So  consequently,  if  you  were  to  use  any  reasonable  score  function 
then  the  numbers  that  you  would  get  would,  possibly  after  a  transform, 
obey  the  rules  of  probability. 


111  ,.111.11  < 


Now  the  key  point  here  is  this.  The  key  point  is  that  the  rules 
by  which  these  numbers  manipulate  are  not  arbitrary.  You  can't  sit  down 
and  think  up  some  clever  rules.  Let  me  quote  somebody  who  said 
something  this  morning.  Somebody  said  about  MYCIN:  they  made  up  their 
own  calculus.  Well,  all  I  can  say  that  if  they  made  up  their  own 
calculus  they  were  silly,  because  they're  not  at  liberty  to  make  up 
their  own  calculus.  You  can't  say,  "Oh,  I  rather  like  the  supremum  or  I 
rather  like  the  infimum".  You  can't  do  it.  The  rules  must  be  the  rules 
of  probability. 

Of  course  I  made  some  assumptions  in  deriving  the  result  of  the 
Inevitability  of  probability:  essentially  that  the  (arbitrary)  scores 
added.  Surely  a  modest  assumption:  how  else  would  you  combine  the 
scores? 


Now  there  is  another  approach  that  gives  the  same  answer  and  one 
must  just  mention  it.  It's  usually  called  the  axiomatic  approach  and  in 
this  country  the  famous  originator  of  it  is  L.  J.  Savage.  Here  you  put 
down  some  reasonable  axioms  and  you  deduce  from  those  axioms  the  rules 
of  probability  again. 

The  best  exposition  of  that  that  I  know  is  in  our  chairman's 
book.  Optimal  Statistical  Decisions,  in  which  the  axioms  are  beautifully 
spelled  out  and  the  argument  goes  through  and  he  proves  that  the  numbers 
have  to  obey  the  rules  of  probability. 

Let  me  summarize  this  by  saying,  to  me  probability  is  inevitable. 
This  is  the  inevitability  of  probability.  There  are  no  other  ways  of 
doing  this  job  except  in  terms  of  probability.  Any  other  method  will 
surely  produce  a  larger  penalty  score.  This  is  not  a  matter  of 
expectation.  The  argument  is  based  on  surely  doing  this. 

Now  you  might  say,  well,  if  it's  like  that,  if  that  is  the 
situation,  why  have  people  been  doing  these  funny  things,  why  don't  they 
use  it? 


Well,  one  of  the  reasons  is  that  people  don't  always  make  enough 
statements  for  their  stupidity  to  be  revealed.  Let  me  give  you  an 
example  of  this.  (Slide  3) 

Suppose  that  you  were  to  say  the  probability  of  given  H  is  a 
half  and  then  you  were  to  say  that  the  probability  of  A„,  given  A  and 
H,  is  2/3- 

Now  those  two  numbers  could  be  anything  you  like,  any  numbers 
between  zero  and  one.  The  Bayesian  world  is  a  very  free  world.  You  can 
have  any  numbers  that  you  like  there,  but  once  you  have  chosen  those  two 
numbers,  your  freedom  has  completely  and  utterly  disappeared  if  you  now 
think  about  the  probability  of  the  intersection  of  A^  and  A 2.  It  must 
be  a  third. 


-  144  - 


Now  clearly  I'm  not  going  to  trap  anybody  if  all  they  will  give 
me  are  the  first  two  statements.  They  can  be  any  numbers. 

I'm  only  going  to  be  able  to  trap  them  when  the  third  comes  in  as 
well,  so  therefore  I  have  to  be  rather  forceful  in  this  scoring 
business.  I  have  to  demand  of  people  that  they  make  logically  related 
statements. 

Let  me  give  you  an  analogy  which  is  not  quite  perfect  but  might 
help  drive  it  home. 

Suppose  that  we  were  doing  measurements,  ordinary  Euclidian 
geometry,  and  we  were  going  to  talk  about  right-angle  triangles.  Each 
one  of  you  in  this  room  told  me  about  the  lengths  of  the  two  sides  of 
the  right  angle;  you  told  me  the  height  and  the  base. 

Everyone  in  this  room  could  give  me  a  pair  of  numbers  and 
provided  they  were  positive  I  couldn't  query  them.  But  as  soon  as  any 
one  of  you  gave  me  the  third  side  of  that  triangle  I  would  be  onto  you 
like  a  shot,  because  Pythagoras'  theorem  would  tell  me  what  that  third 
number  would  be,  and  anybody  in  this  room  that  gave  me  a  number  that 
didn't  satisfy  Pythagoras'  theorem,  you  would  all  say,  oh,  he's  crazy. 

That's  the  same  situation  here.  The  first  two  statements  can  be 

any  old  numbers,  but...  The  reason,  it  seems  to  me,  that  many  people  in 
their  arguments  don't  fall  into  the  difficulty  is  because  they  don't 
allow  themselves  to  go  near  the  difficulty,  they  don't  give  themselves 
the  chance  of  exhibiting  it. 

It's  very  easy  to  say  that  this  is  .3  and  this  is  .8  and  this  is 
.7,  but  if  you  combine  those  things  together  then  you  get  yourself  into 
difficulty.  This  is  called  "coherence." 

Now  I  felt  that  I  really  Just  had  to  say  something  about  Bayes' 
theorem.  The  subject  is  rather  peculiarly  called  Bayesian  statistics. 
(Slide  3) 

I’ve  written  out  Bayes  theorem  there  in  its  simplest  form.  I 
have  omitted  H  from  the  notation,  so  that  you've  got  to  add  an  H  all  the 
way  through.  It  Just  says  that  the  odds  prior  to  A  transform  into  the 
odds  posterior  to  A  by  multiplying  by  the  likelihood  ratio. 

The  most  beautiful  example  of  this  that  I  know  is  in  a  court  of 
law.  Let  the  event  A  be  that  the  defendant  is  guilty.  On  the  right 
are  the  odds  on  him  being  guilty  before  A  ,  which  is  some  evidence, 
comes  along  and  on  the  left  are  the  odds  after  the  evidence.  It  says 
that  what  you  have  to  do  is  to  multiply  the  odds  before  you  get  the 
evidence  by  the  ratio  of  probability  of  the  evidence  on  the  assumption 
that  he  was  guilty  to  the  probability  of  the  evidence  on  the  assumption 
that  he  was  innocent. 


-  145  - 


Those  are  the  two  things  that  are  relevant.  At  the  end  of  this 
talk  I  will  give  you  an  example  of  somebody  only  using  the  numerator 
there,  only  trying  to  get  through  with  the  numerator  and  forgetting  the 
denominator.  Of  course  that  will  produce  a  curious  and  unsatisfactory 
answer. 

1  now  am  going  to  make  a  challenge.  The  challenge  is  this:  that 
in  the  study  of  uncertainty  anything  that  can  be  done  by  whatever  it  is, 
can  be  better  done  by  means  of  probability.  (Slide  3) 

This  is  my  challenge,  to  you.  I  make  it  not  in  any  arrogant  or 
conceited  way.  I  think  this  is  the  way  that  science  proceeds.  Science 
proceeds  by  somebody  setting  up  a  theory,  setting  up  a  coconut  shy,  and 
trying  to  destroy  it. 

Those  of  you  who  are  familiar  with  the  work  of  the  British 
philosopher,  Karl  Popper,  will  know  that's  the  keystone  of  his  argument. 
The  argument  is  that  what  a  scientific  theory  should  first  do  is  to  have 
lots  of  deductions  that  can  be  made  from  it. 

There  is  nothing  to  be  said  for  a  theory  that  says  each  planet 
has  got  behind  it  an  angel  pushing  it  around.  There's  nothing  in  that 
theory  because  it  doesn't  tell  you  where  the  planets  are  going,  but  if 
you  take  a  theory  of  Newtonian  attraction  you  can  work  out  where  the 
planets  are  going  to  be,  and  deduce  lots  of  things. 

Having  deduced  all  these  things,  you  then  test  them  and  see  if 
they're  right  and  you  try  to  destroy  the  theory,  and  as  long  as  you 
can't  destroy  it  you  enhance  the  theory. 

So  I'm  giving  you  enormous  opportunities  to  destroy  this  theory, 

I  challenge  you;  that  anything  you  can  do  by  fuzzy  logic,  anything  you 
can  do  by  belief  functions,  can  be  better  done  by  probability.  Notice 
that  is  the  caveat  there:  anything  that  can  be  done  by  fuzzy  logic. 

I'm  not  saying  that  probability  could  do  anything.  I  think  it  probably 
can,  but  I'm  not  saying  that.  What  I'm  saying  is  if  it  can  be  done  that 
way  it  can  be  done  by  probability  and  it  can  be  done  better  that  way. 

I  can't  respond  to  that  challenge  immediately  because  I  don't 
know  what  you're  going  to  say  so  what  I've  done  is  I've  gone  through  the 
literature  a  ?.ittle  and  taken  some  examples  and  discussed  them. 

Here's  an  example  from  fuzzy  logic.  This  example  is  taken  I 
believe  from  one  of  Zadeh's  papers.  (Slide  M) 

The  statement  he  quotes  is  Berkeley's  population  is  over  100,000. 
He  says  that  this  is  a  fuzzy  statement,  because  over  100,000  is  fuzzy  — 
it  means  a  little  bit  over  100,000  but  not  too  much.  I  would  agree  with 
him,  it  is  fuzzy.  What  he  doesn't  also  say  is  the  rest  of  the  thing  is 
fuzzy  as  well. 


What  does  he  mean  by  population?  Does  he  include  the  students 
who  are  only  there  for  part  of  the  year,  or  is  it  only  the  residents? 

Berkeley.  Where  is  Berkeley?  Is  he  referring  to  the  town  in 
Gloucestershire,  England,  or  the  one  in  California? 

Everything  is  fuzzy.  Every  statement  is  fuzzy.  There's  nothing 
peculiar  about  the  over  100,000. 

Now  the  probability  approach  to  this  would  first  of  all  say, 
we've  got  to  think  about  something  well  defined.  Let  me  make  an 
assumption  that  he  was  talking  about  Berkeley,  California,  which  is  part 
of  the  information  H,  of  course.  Let  us  assume  that  part  of  H  includes 
the  knowledge  it  was  Berkeley,  California. 

Now  what  we  would  agree  to  do,  I  think,  might  be  to  go  down  to 
the  relevant  official  —  I  don't  know  what  he's  called  in  the  United 
States  --  in  Berkeley  and  ask  him  what  is  the  population.  That  will  be 
a  number  and  therefore  we  could  make  a  probability  statement  about  X, 
and  then  we  could  go  find  out  what  X  is,  and  we  could  score  it.  It  will 
have,  of  course,  to  be  done  on  the  basis  of  whatever  information  is 
available. 

So  often  people  forget  the  information.  It  may  happen  that  one 
of  you  in  this  room  actually  knows  the  population  of  Berkeley,  in  a 
sense  or  you  may  know  what  the  official  figure  was  last  year,  or 
something  like  that. 

So  here  is  a  statement  in  fuzzy  logic  that  can  perfectly  well  be 
turned  into  a  probability  statement. 

Now  let's  take  another  one,  a  little  more  sophisticated.  John 
has  duodenal  ulcer  with  CF  equal  .3*  CF  is  certainty  factor  of  .3» 
(Slide  M) 

That  one  is  a  little  more  sophisticated  (and  a  little  more 
serious  where  it's  dealing  with  someone  who  is  sick  for  one  thing)  but 
the  real  thing  is  it's  got  a  number  attached  to  it  of  .3. 

Now  that  is  certainly  very  fuzzy.  Let's  assume  John  is  a 
definite  person,  so  that  there  is  no  fuzziness  about  him.  The  really 
interesting  thing  about  that  is  the  duodenal  ulcer,  if  I  understand  it 
correctly,  is  not  very  well  defined.  Consequently,  what  one  has  to  do 
is  to  think  about  a  concept  of  duodenal  ulcer  without  being  really  clear 
what  it  is. 

On  the  other  nand,  there  are  statements  by  doctors  that  John  has 
got  a  duodenal  ulcer  and  those  are  firm  statements.  He  did  say  duodenal 
ulcer,  so  consequently  the  probabilistic  approach  would  introduce  two 
things.  We  tend  to  use  the  Greek  alphabet  for  things  that  we  can't 
actually  get  in  touch  with  directly,  and  the  Roman  alphabet  for  things 
that  we  can.  Delta  would  correspond  to  this  vague  thing,  duodenal 
ulcer.  D^  would  be  the  i-th  doctor's  statement  that  he  has  duodenal 
ulcer. 


*  -  ^  k.--  «  V  W-*  •  -  ir^  jr  n  J  »  if.  ^  fS'  •.'.  .  ’•  u-'V  w  .»k.>  -/H  ir».^  F.  ¥  F  '”.»  r_»  A  .  V-' 


The  sort  of  thing  we're  Interested  in  is  if  the  doctor  said  he's 
got  duodenal  ulcer,  (that's  part  of  our  information)  what  is  the 
probability  that  he  truly  has  duodenal  ulcer? 

Now  notice  that  this  statement  has  associated  with  it  a  number. 
Well,  there  are  two  queries.  What  is  the  standard?  It's  not  the 
question  of  how  did  he  get  .3,  but  what  does  it  mean.  How  can  I  check 
that  value  of  .3?  Balls  and  urns,  or  whatever  it  is. 

Another  point  I  want  to  make  is  those  .3's  cannot  be  combined  by 
the  rules  of  fuzzy  logic  because  if  you  do  combine  them  using  the 
supremum  and  infimum  you  will  do  worse  for  sure  if  De  Finetti  were  to 
come  along  and  score  you.  You'll  Just  tote  up  the  scores  and  see  that 
the  fuzzy  person  had  a  larger  penalty  score  than  the  probabilist. 

Let  me  now  turn  to  belief  functions.  One  of  the  properties  of 
belief  functions  is  that  the  belief  in  event  A  —  I'm  omitting  H  here 
for  ease  of  notation  -  the  belief  in  A,  plus  the  belief  in  not  A  can  add 
up  to  something  less  than  one.  So  if  I  denote  belief  of  A  by  little  a 
and  the  belief  in  not  A  by  b,  a  plus  b  can  add  up  to  something  less  than 
one.  (Slide  5) 

Well,  now  let's  score  a  person.  I  say  that  a  person  who  does 
this  will  for  sure  score  less  than  a  probabilist  who  makes  them  add  up 
to  one. 


a  and  b  are  adding  up  to  something  less  than  one,  so  the  total 
deficiency  is  one  minus  a  minus  b,  so  let  us  take  that  deficiency  and 
add  half  of  it  to  a  and  the  other  half  to  b  giving  me  new  numbers  a 
prime  and  b  prime. 

So  if  I  started  with  .*<  and  .2  adding  up  to  .6,  the  deficiency  is 
.1),  half  of  it  is  .2  and  I'm  now  going  to  add  .2  to  each  of  them. 

It's  very  easy  to  show  that  the  total  scores  will  be  less  with  a 
prime  and  b  prime.  On  the  the  fourth  line  is  the  score  when  the  event 
is  true.  On  the  fifth  line  is  the  score  when  the  event  is  false. 

Whether  the  event  is  true  or  the  event  is  false,  the  probabilist 
with  his  a  prime  and  b  prime  will  for  sure  get  the  smaller  score  than 
the  belief  function  person. 

In  my  paper,  I  had  dealt  with  one  of  Glenn  Shafer's  beautiful 
examples  concerned  with  an  imaginary  medical  disease  called  ploxoma 
which  I  discussed  before.  There  isn't  the  opportunity  to  discuss  it 
with  you  now  in  any  sort  of  detail. 

Let  me  Just  extract  from  the  argument  one  point.  Let  us  suppose 
that  we  have  some  data,  very  obvious  and  straightforward  data,  in  which 
a  number  n  of  trials  have  been  carried  out  and  r  of  those  have  resulted 
in  success.  We  have  r  successes  out  of  n  trials  and  let  us  suppose  that 
the  chance  of  success  on  any  trial  is  beta. 


-V.v,.- v.v 


_  J- 


■  -i  ■ 


148  - 


I  have  to  Just  say  here  that  I  am  using  the  word  chance  in  a 
different  sense  from  which  Glenn  Shafer  seemed  to  be  using  it  this 
morning  which  is  why  I  asked  him  that  question.  We  can  discuss  this  if 
need  be. 

But  there  is  a  chance  beta,  and  the  likelihood  is  the  familiar 
binomial  likelihood,  There  is  the  situation. 

Now  in  this  example  of  Glenn  Shafer's  he  says,  well  it  may  be 
that  those  trials  have  some  relevance  to  what  you're  interested  in,  but 
they're  not  quite  the  same. 

For  example,  let  us  suppose  that  those  trials  are  being  carried 
out  in  medical  patients  in  England  and  here  we  are  in  the  United  States. 
Well,  we  may  well  say  to  ourselves  the  chance  beta  in  England  is 
different  from  that  in  the  United  States. 

On  the  other  hand,  the  British  study  does  tell  us  something. 

It's  not  entirely  Irrelevant.  We  feel  that  if  we  were  to  do  the  same 
sort  of  study  in  Washington  we  wouldn't  have  exactly  the  same  thing,  but 
we'd  have  something  like  It.  The  two  diseases  or  whatever  it  is  that 
we're  studying  are  perhaps  not  quite  the  same  thing,  but  they're 
similar. 

This  Is  a  very  real  situation  and  I  thought  a  very  fine  point  to 
bring  out,  the  fact  that  quite  often  one  has  data  that  is  of  some 

relevance  to  what  you're  studying  but  doesn't  fit  absolutely  perfectly. 
So  you  cannot  say  there's  that  data  with  chance  beta,  here  1  have 
another  situation  with  chance  beta  and  so  now  I  learned  something  about 
beta  from  those  r  statistics  out  of  n  trials,  and  I  can  now  apply  it 
here.  That's  not  true  in  many  cases. 

Well,  what  do  we  do?  The  simplest  thing  to  do  is  to  imagine  that 
in  Washington  the  chance  is  some  number  gamma.  Gamma  is  not  the  same  as 
beta  but  they're  related.  What  we  would  do  is  say  we're  uncertain  about 
gamma  and  so  we  would  think  about  the  probability  distribution  of  gamma 
given  beta,  which  is  an  assessment  of  how  like  the  Washington  population 
is  to  the  relevant  British  population  in  respect  of  what  this  success  is 
that  we're  talking  about.  Then  one  can  infer  what  the  probability  of 
gamma  is  by  the  usual  rules  of  probability. 

There  is  one  point,  incidentally,  that  comes  out  of  this 
argument,  that  when  you  have  to  study  these  problems,  in  a  sense  you 
don't  have  to  think.  By  that  1  mean  is  that  when  you're  doing  this 
discussion  you  know  very  well  that  all  you  have  available  are  those 
three  rules  of  probability  and  nothing  else.  Everything  follows  from 
the  rules  of  probability. 

Consequently,  if  I  want  to  get  hold  of  gamma,  I  know  very  well 
I've  got  to  go  and  use  the  rules  of  probability  and  that  the  relevant 
law  of  probability  will  be  there.  It's  a  recipe.  It's  a  rule  for 
carrying  out  the  calculation. 


Consequently,  there  is  no  need,  in  my  view,  for  any  belief 
function  structure  connecting  the  population  in  England  and  the 
population  in  Washington.  You  can  do  it  perfectly  well  by  probabilistic 
arguments. 

Now  let  me  make  four  rather  miscellaneous  points.  Complication: 
Professor  Zadeh  (I  think  I  quote  him  correctly)  said  this  morning  the 
real  world  is  too  complicated  for  simple  theories.  (Slide  6) 

I  couldn't  disagree  more.  I  was  brought  up  in  a  small  town  in 
England  near  the  little  village  of  Ockham  and  in  the  13th  century  there 
lived  in  this  village  of  Ockham  a  gentleman  named  William,  and  William 
of  Ockham  said  that  entities  should  not  be  multiplied  beyond  necessity. 

I  suppose  I  learned  this  when  I  was  young  but  it  still  seems  to 
me  to  be  extremely  good,  an  extremely  valid  principle  that  you  should 
make  the  situation  as  simple  as  you  possibly  can  and  that  seems  to  be 
admirably  met  by  probability. 

There  are  only  three  rules  of  probability,  whereas  the  number  of 
entitles  knocking  around  in  fuzzy  logic  grows  by  the  hour. 

Possibilities.  Now  today,  what  is  it  we  have  today?  Usualitles.  The 
complexity  grows  and  grows  and  grows.  I  don't  think  that's  the  right 
way  to  go. 

The  right  way  is  the  way  that  William  of  Ockham  said  to  us.  Make 
the  situation  as  simple  as  you  can.  If  it  doesn't  work.  Okay,  you'll 
have  to  make  it  a  little  more  complicated,  and  if  it  doesn't  work  again 
make  it  a  bit  more  complicated. 

In  fact,  if  you  do  make  it  complicated,  you  are  almost  certainly 

wrong. 

I  don't  know  too  much  about  modern  theoretical  physics,  but  when 
I  talk  to  theoretical  physicists  for  a  moment  they  are  very  worried 
because  their  models  are  getting  too  complicated.  Everybody  is  looking 
around  for  that  simple  thing  because  they  believe,  as  Einstein  did,  in 
simplicity.  It  looks  as  though  Hawking  in  Cambridge  is  getting  very 
near  to  it,  there  is  a  simple  rule  underlying  it  all. 

Simplicity  is  a  thing  much  to  be  admired.  I'll  always  remember 
the  time  I  spent  at  the  Harvard  Business  School  with  Bob  Schlalfer. 
Schlalfer  said  to  me  one  day.  "people  love  to  delight  in  complexity,  it 
obscures  all  their  mistakes." 

I  think  there  is  a  lot  to  be  said  for  that.  Complexity  is  to  be 
abhorred.  It's  not  the  right  way  to  think  about  things.  Simplicity  is. 

The  next  question  I'd  like  to  discuss  is  a  tricky  one  of 
coverage.  My  challenge  is  that  anything  that  can  be  done  by  these  other 
methods  can  be  done  by  probability.  The  challenge  is  not  that 
everything  can  be  done  by  probability,  but  that  may  well  be  true. 


-  150  - 


I  can  give  you  problems  that  I  as  a  probablllst  cannot  solve. 

Let  me  give  you  a  very,  very  simple  one  indeed,  that  occurs  in  a 
simplified  form  of  medical  diagnosis  where  there  are  a  number  of 
symptoms,  each  of  which  is  either  absent  or  present.  There  are  **0  of 
them. 

The  probability  structure  of  <<0  symptoms,  each  of  which  are  0-1 
variables,  is  extremely  complicated  and  1  certainly  do  not  know  how  to 
handle  it  myself  for  the  moment  but  that's  not  to  say  that  probability 
is  the  wrong  approach. 

It  is  well  known  that  when  you  take  a  scientific  theory,  if  it's 
good  a  theory  it  poses  lots  of  very  difficult  problems.  The  usual 
example  quoted  is  the  three-body  problem  in  Newtonian  mechanics.  When 
Newtonian  mechanics  was  first  formulated,  people  thought  it  would  be 
very  easy  to  study  the  motion  of  three  bodies  and  it  turned  out  to  be  a 
very  difficult  technical  problem. 

The  fact  is  that  there  are  lots  of  problems  about  uncertainty 
that  we  probabillsts  cannot  solve  at  the  moment.  I  think  that  they  are 
mostly  technical  difficulties. 

Now  another  point  that  was  made  this  morning:  decision  making. 
The  axiomatic  approach  that  I  mentioned  of  Jimmy  Savage's  leads  to  the 
rules  for  decision  making  and  the  rules  for  thinking  about  uncertainty 
are  probabilistic.  The  rule  for  making  decisions  is  the  rule  of 
expected  utility. 

What  I  do  not  understand  is  how  we  are  supposed  to  make  decisions 
on  the  basis  of  belief  functions  or  fuzzy  logic. 

You  remember  that  slide  I  had  up:  (Slide  5}  let's  put  it  back 
again.  It  can  happen  that  the  belief  in  A  plus  the  belief  in  not  A  is 
less  than  one.  I  don't  see  how  you're  going  to  use  that  sort  of 
situation. 

You  know  the  story,  do  you,  of  Buradin's  ass?  Buradin's  ass  was 
placed  equidistant  from  two  equally  succulent  bundles  of  hay  and  he 
starved  to  death  because  he  could  see  no  reason  for  preferring  one 
bundle  over  the  other  one. 

Belief  function  people  seem  to  be  in  the  exactly  same  situation. 
They  put  .2  here  and  .^l  there,  leaving  the  other  .*i  that's  over.  What 
are  they  going  to  do  with  that  starve  to  death?  They  have  to  make 
up  their  minds.  They  have  to  act.  At  least  real  people  have  to  act, 
and  in  order  to  act,  it  seems  to  me  that  you  have  got  to  use  the  full 
force  of  the  probability  argument. 

So  my  question  there  is,  how  on  earth  can  these  arguments  be  used 
in  decision  making? 


It  is  sometimes  argued  that  probability  theory  is  inadequate.  I 
have  pursued  some  of  these  arguments  and  it  doesn't  seem  to  me  that  it 
is  inadequate.  In  fact,  it  seems  to  me  to  be  quite  the  contrary. 

If  you  carry  through  the  probability  argument,  1  at  any  rate  am 
continually  surprised  with  the  richness  of  the  results  that  it  produces. 

Let  me  give  you  a  very  simple  example.  This  occurs  in  a  paper  by 
Diaconis  and  Zabell  in  the  Journal  of  the  American  Statistical 
Association  a  couple  of  years  ago.  It  is  concerned  with  a  trial,  a 
criminal,  and  som^e  evidence  about  the  criminal  being  lefthanded. 

Now  when  you  do  the  probability  calculations,  it  turns  out  that 
one  piece  of  probability  that  you  have  to  put  in  is  the  probability  that 
a  person  taken  at  random  from  the  population  is  left  handed.  This  is 
not  entered  at  all  into  the  Diaconis  and  Zabell  argument. 

Now  if  I  go  through  the  probability  mechanism,  it  turns  out  that 
I  have  got  to  think  about  the  probability  of  a  random  person  from  the 
population  being  lefthanded  and  I  say  to  myself,  well,  is  that  right  or 
is  it  wrong,  and  surely  it  is  right  and  it  is  a  relevant  thing. 

If  lefthandedness  is  very  rare,  the  evidence  that  the  person  is 
lefthanded  says  much  more  than  if  lefthandedness  is  a  very  common  thing. 

What  is  happening  is  that  I  carry  through  the  probability 
argument  and  I  find  that  certain  things  enter  into  the  situation  and  I 
say  to  myself,  "well,  is  that  right?"  and  it  always  happens  in  my 
experience  that  it  is  right,  that  those  things  are  Indeed  relevant. 

1  think  we  had  an  example  this  morning,  though  I'm  not  quite 
sure,  when  Glenn  produced  his  example  with  icy  conditions  and  the 
thermometer,  and  he  had  to  bring  in  a  P  and  a  Q  and  it  seemed  to  me, 
thinking  about  it  very  quickly,  that  it  was  very  right  that  P  and  Q 
should  have  entered  into  the  argument  and  if  they  did  not,  then  the 
argument  surely  is  unsatisfactory. 

Surely  the  arguments  about  lefthandedness  is  unsatisfactory  if  it 
doesn't  take  account  of  the  rarity  of  lefthandedness. 

I  have  Just  a  few  minutes  left  and  I  would  like  to  conclude  with 
a  little  example  from  probability  that  may  Interest  you.  The  full 
example  will  appear  in  the  paper  and  I  put  it  in  order  partly  to  be 
constructive.  I  don't  want  to  appear  to  be  knocking  everybody  down 
(which,  of  course,  1  am)  but  I  wanted  to  appear  to  be  a  little  bit 
constructive,  and  I  wanted  to  try  and  show  you  an  example  of  what  seems 
to  me  to  be  the  extreme  sublety  of  the  probability  argument.  You  might 
say  when  you've  seen  this  that  it  isn't  subtle  at  all,  but  it  came  upon 
me  somewhat  as  a  surprise  and  I  think  it's  come  upon  other  people  as  a 
surprise. 


Here  is  the  little  problem.  The  problem  Is  this;  it’s  being 
discussed  in  the  legal  literature  quite  a  lot. 

A  crime  has  been  committed  and  it  is  known  that  the  criminal  lies 

in  a  set  of  n>1  people,  one  of  whom  is  a  suspect  and  there  are  n  other 

people,  so  there  is  a  suspect  S  and  n  other  people  and  they  are  numbered 
1,  2,  3... up  to  n,  and  the  numbers  contain  no  information.  Of  course  in 
reality  they  would  be  Smith  and  Jones,  et  cetera. 

The  event  that  we've  interested  in,  of  course,  is  whether  the 
suspect  is  guilty.  That  is  the  event  C  suffix  S.  (Slide  6) 

Then  there  are  the  other  events,  that  the  other  persons  are 
guilty.  G  is  the  event  that  person  number  i  is  guilty. 

Now  the  evidence  is  produced  that  the  criminal  is  lefthanded  with 

a  value  of  .8.  Now  the  first  thing  you  have  to  ask  yourself  is  what 

does  that  mean. 

In  the  paper  from  which  I've  taken  it,  it  is  held  to  mean  that  if 
we  took  the  population  of  lefthanded  people  —  that  is,  if  we  took  all 
the  lefthanders  in  here,  the  probability  is  .8  that  we  would  find  the 
criminal  amongst  them.  That  is,  conditional  on  lefthanded,  the 
probability  of  the  criminal  being  there  is  .8. 

I  don't  think  that's  what  I  mean  and  I'm  quite  certain  it's  not 
what  a  British  forensic  scientist  would  mean.  A  British  forensic 
scientist  would  mean  that  having  got  the  criminal  in  front  of  him,  his 
probability  of  his  finding  him  to  be  lefthanded  would  be  .8.  1  say 
British  scientists  because  I  have  no  experience  with  what  American  ones 
would  do  in  this  context. 

The  British  forensic  scientist  would  say  had  I  got  the  criminal 
in  front  of  me,  there  is  a  probability  of  .8  he  would  be  lefthanded. 

That  is  a  statement  where  the  event  A  you're  uncertain  about  is 
lefthandedness  and  you're  given  that  he's  the  criminal. 

The  other  statement,  the  one  I  had  before,  is  a  statement  in 
which  lefthandedness  is  in  H  and  A  is  being  the  criminal  so  they're 
upside  down  statements. 

I've  written  the  two  statements  out.  (Slide  6).  The  first 
statement,  the  criminal  would  be  found  amongst  the  lefthanders,  and  the 
second  statement,  given  the  criminal  the  probability  of  being  lefthanded 
is  .8. 

I'm  going  to  use  the  latter  interpretation.  It  is  the  one  that 
seems  to  me  to  be  right. 

Now  some  evidence  comes  in.  The  first  piece  of  evidence  is  that 
the  suspect  is  lefthanded.  Now  as  soon  as  that  comes  in  you  begin  to 
think  he  might  be  guilty,  because  you've  already  had  the  information 
that  the  criminal  has  high  probability  of  being  lefthanded. 


P.  9m  ■  H  J  ■  l.i  i.»  I » 1. '  I.'  Ail.  JA  A  L  'I I 1 J.- '  L 


Notice  how  the  rarity  of  being  lefthanded  comes  in. 

Lefthandedness  occurs  in  only  about  10  percent  of  the  population,  so  the 
probability  of  being  lefthanded  for  an  ordinary  person  is  about  .1. 

This  evidence  comes  in.  Now  another  piece  of  evidence  comes  in. 
The  other  piece  of  evidence  is  that  person  number  one  is  also 
lefthanded.  Now  that  sends  the  probability  of  guilt  down  a  little, 
doesn't  it,  because  there's  another  lefthanded  person  knocking  around 
and  so  the  probability  that  the  suspect  is  guilty  is  going  down. 

Now  imagine  another  piece  of  evidence.  This  piece  of  evidence  is 
that  there  is  a  lefthander  amongst  those  n.  You're  not  told  that  number 
one  is  the  lefthander.  You're  told  that  there  is  a  lefthander. 

I've  written  that  as  the  negation  of  1q.  1q  means  that  there 
aren't  any  lefthanders.  The  information,  the  evidence  there,  is  that 
there  is  a  lefthander  amongst  the  n. 

Let  me  Just  recapitulate.  There  is  the  evidence  that  the  suspect 
is  lefthanded  for  sure.  No  fuzziness  or  anything  about  that.  He  is 
lefthanded.  There  is  the  evidence  that  number  one  for  sure  is 
lefthanded.  There  is  the  evidence  that  there  is  somewhere  amongst  those 
n  people  a  lefthanded  person. 

Now  we  can  calculate  the  probability  the  suspect  is  guilty  given 
the  evidence  that  he  is  lefthanded  and  person  number  one  is  lefthanded. 

You  can  also  calculate  the  probability  that  the  suspect  is  rullty 
given  that  the  suspect  is  lefthanded  and  that  there  exists  another 
lefthanded  person  and  those  two  are  different. 

That  was  my  surprise  to  me  and  the  sublety  of  the  argument  to  me. 
These  two  are  not  the  same.  In  fact,  the  former  one  is  always  less  than 
the  latter. 

So  now  we  have  a  curious  situation.  Someone  has  told  you  that 
there  is  a  lefthanded  person  amongst  numbers  one  up  to  n,  and  this  is 
the  probability  that  the  suspect  is  guilty. 

Now  suppose  I  say,  oh,  yes,  there's  a  lefthanded  person  amongst 
that  group  of  people,  it  won't  do  any  harm,  will  it,  if  you  tell  me  his 
number. 


They  can't  give  you  any  information  to  tell  what  the  number  is  so 
consequently  if  you  now  say,  oh,  his  number  is  one,  you  appear  to  be  in 
the  first  situation  and  that  probability  is  not  equal  to  that  one. 

That  seems  to  me  a  pretty  subtle  and  curious  state  of  affairs  and 
to  describe  probability  theory  as  being  inadequate  in  a  situation  like 
that  does  seem  to  me  to  be  rather  strange. 


154  - 


'•»(L 


It's  a  beautiful  and  It's  a  rich  and  it's  a  wonderful  subject, 
and  1  coamend  to  your  attention  that  probability  is  the  only 
satisfactory  measure  for  uncertainty. 

(Applause. ) 


-  155  - 


tf.  4Att^  yw*- 

ij  H  M  «X»«.>*<i»  yni  Uf^JH^  Xi  ^ 

AddUJ.  ^/A,u4, 1 H)  •  f(*,lff)  *  fKN 

frUNDMt  :  jUtn.  Axil  J^m^rtCuU.  «  ^  (AmA 
JiiJJUk  .  ^  AAm*. 

9>4  A ,  H  j 

•A  AXtlfux  <4^4*4. 


4«1(.  4*^  rnmuuHiiUf  ^  A  jiutAm 

H  t4t  lltMUyUtl  Uy  <m««iAt»  H. 

lcM«  /,/c)  y  4,  4#«t  Aw4 

/•M  V  ^  A****  ^  ^ 

0  ^  k 

f  j  //  M  /• 

ft*«4«  ^  4,-  yU  (A-,liiJ  i*itt...  xHaJL. 

'7lt<k  A««»«4  muA*.  4*<  *4^  '^mw  ^ 

/lxi*ii«4T«c  Afftmeu 

/mcviT/MaiTy  »f  ?1*»444»uTy 

£Uiy  (tlkf  muiLrU  *!iJi  A4»»i/^  4 


CoMrx  rvcf 

AUm.  -^^4,4  |/fj  «  /l 
$/^yts  *7Meo««M 

I  A/)  ^  ^Mif^a)  •^l^*) 

■flK\f^i)  fKI*a)f^*a) 


4*t«(4  .AinttAti*  ^  ./liiAf 

4  A,  A*A» 


X  tr 


CNALLfM^C.  V«  A*  AB»t>|  ^  mitAUtm^^  — 
Aa/— c»>x  4ii.  ilMia.  4y  .... 
e*.  Ax  MtUt  Awa.  x*n*4fcAl^. 


fixAiiJttyA  .^xjutUxti^  At  tw  /00,  M4 

X  *  ^^ii»4tm  H  *  lOi^w  lAix 

y^iak  «A*A  |llirrfn,4i/  mJUx*  ^CF  t  C.iJ 
A  Amx  tUim^UmxJt  aUcw 
J;  tUtmtlxj^tJ  juAju 

ilk\t.) 

C^» 

Cumma^  Ot^<  ^mJUa 


<  t 

/^'7tk  T  4»m  >%M<i  ^  XmJU  ‘  C^*mu  ^ 

Cl  4ii,»At  ,  Xjh  mJL  AmI/«  :  ^ 

^{)[\^) 

^  ii^}  ‘ 


f«m^t./C4T>eN  Ou^ 

CovtM4f£  Tistmit*/  di^iti/tiii 
J)rtlti»tt  ~  MItKiKf  ■  fttfJUlUJ  duUitL 
/MA-tMulKy. 

$  4«iy  «t  ,  4t  (Aw  .yUu>M4 

fiw‘«WI»/  *««i*Y«^  U^-JUo<<U> 

^hMril.  C^IMIl'it/  ,  ^«i.  ^  <i  tf.  / 

£wiyyU>  <U 

M  JUft.JU.^  JC, 


DISCUSSION  ON  PRESENTATION  OF  DENNIS  LINDLEY 


DR.  DeOROOT:  I'm  sure  that  Glenn  Shafer  and  Mr.  Zadeh  would  like 
to  have  a  chance  to  take  up  the  challenges  that  were  thrown  out,  but 
perhaps  we  should  have  our  discussants  comment  first. 

I  think  we  should  rule  out  allowing  anyone  to  say  the  same  thing 
at  the  end  of  more  than  three  of  the  four  talks.  You  can  only  say  the 
same  thing  twice. 

Art,  do  you  have  some  comments? 

DR.  DEMPSTER:  Sure.  I  think  I  agree  with  almost  everything 
Dennis  says  and  I  think  the  way  he  says  it  is  wonderful.  The  agreement 
I  guess  is  modulo  the  usual  amounts  of  fuzziness  on  both  our  parts. 

Well,  I  probably  didn't  agree  with  his  first  sentence,  either. 

I  don't  know  that  this  counts  as  repeating  myself  having  said 
this  morning  we  all  need  to  be  exposed  more  to  ideas  of  probability.  I 
think  it's  marvelous  that  we've  had  this  proselytizing  talk. 

My  own  experience  was  that  I  read  Feller  Volume  I  in  the  early 
1950a  and  I've  been  convinced  about  probability  ever  since  then,  so  that 
things  like  Savage's  axioms,  and  scoring  rule  theorems,  and  Dutchbook 
arguments  I  think  are  all  very  pretty  but  I  was  already  convinced,  so 
they  didn't  Interest  me  a  great  deal. 

I  do  think  that  there's  a  slight  misrepresentation  here  about 
belief  functions.  Belief  functions  are  based  on  the  theory  of 
probability  so  almost  everything  that  Dennis  is  saying  really  is  helping 
support  belief  functions  if  you  look  at  it  in  the  right  way. 

I  would  like  to  make  one  comment  that  I  think  is  at  a  slightly 
deeper  level.  One  thing  that  I've  learned,  or  at  least  learned  to  say, 
from  reading  a  little  recently  about  artificial  Intelligence,  glancing 
only  an  hour  or  so  at  David  Marr's  book,  is  this  notion  that  things  can 
only  be  understood  if  you  look  at  them  at  many  different  levels. 

My  reaction  to  what  Dennis  is  telling  us  is  that  he's  giving  us  a 
perfect  story  at  one  level  but  it's  too  closed  in.  It  isn't  relating 
far  enough  out  into  the  world. 

Another  thing  that  the  AI  people  preach  or  tell  us  which  is  very 
valid  and  something  I've  been  saying  about  statistics  for  a  long  time  is 
that  you  can't  understand  it  unless  you  relate  it  to  the  goals,  the 
problems,  the  thing  that's  being  worked  on. 

I  might  comment,  perhaps  not  so  specifically  on  what  Dennis  is 
saying  but  a  comment  of  the  chairman  this  morning  that  he  couldn't  see 
why  we  wanted  to  specify  two  numbers  rather  than  one,  it's  hard  enough 
to  do  one  so  to  try  to  do  two  it's  at  least  twice  as  hard,  or  probably 
much  more  from  Dennis'  point  of  view. 


-  158  - 


The  thing  is  that  there's  one  of  these  levels  that's  going  on 
that's  concerned  with  constructing  these  probability  models  in  relation 
to  the  goal  of  the  probability  analysis  and  that's  something  that  Glenn 
has  written  a  great  deal  about,  something  that  I  think  is  the  essence  of 
the  problem,  not  what  Dennis  is  talking  about  since  I've  believed  in 
that  for  more  than  30  years. 

The  essence  of  the  problem  is  how  to  we  construct  these  things  in 
a  way  that  has  some  kind  of  scientific  validity  and  it  Just  does  seem  to 
happen  that  'when  you  do  that  (for  example  in  Glenn's  example  about  Fred) 
you  come  up  with  probabilities,  sure,  but  they  get  reflected  in  ways 
that  lead  to  upper  and  lower  probabilities  or  beliefs  and 
plausibilities,  whatever  kind  of  terminology  you  want  to  use. 

That  I  think  is  operating  at  a  whole  different  level  of 
understanding  that  the  theory  that  Dennis  has  talked  about  doesn't  seem 
to  relate  to. 

DR.  WATSON;  I  think  it's  very  difficult  to  know  how  to  respond 
to  Dennis'  very  clear  argument  of  the  inevitability  of  probability,  but 
it  has  to  be  faced  because  some  people  in  this  room  don't  share  his 
conception  of  this  inevitability  and  you  then  have  to  ask  what's  wrong, 
is  it  the  argument  or  is  it  the  premises. 

I  think  there  are  two  premises  to  the  argument  which  are  worth 
looking  at,  and  I  won't  say  much  about  them  but  it's  something  that 
other  people  may  have  thought  of. 

Firstly,  is  Judging  probability  the  same  as  Judging  length? 

You'll  notice  that  part  of  the  argument  is  predicated  on  the  assumption 
that  it  really  is  the  same  sort  of  thing. 

1  think  it's  not,  but  I  don't  think  we  ought  to  spend  time 
talking  about  that  now. 

The  second  point  is  that  the  argument  he  presented  from  De 
Finettl  was  based  on  scoring  rules  and  1  was  reminded  of  a  little  verse 
that  was  on  the  wall  of  a  house  1  stayed  in  when  1  was  young,  about  the 
Great  Scorer  of  Life.  It  went,  "And  when  the  one  Great  Scorer  comes  to 
write  against  your  name,  he  writes  not  if  you  won  or  not  but  how  you 
played  the  game." 

(Laughter. ) 

Now  that  version  of  Victorian  morality  is  probably  not  terribly 
appropriate  for  this  afternoon's  discussion  but  it  did  strike  me  that  to 
go  along  with  the  argument  that  Dennis  is  making,  you  had  to  presume 
that  this  scoring  mechanism  was  a  sensible  one  and  I'm  suggesting  that 
one  may  refuse  to  go  along  with  that  part  of  the  argument  and  this  of 
course  allows  you  an  out  from  the  conclusions  of  the  theory. 


The  two  points  I  want  to  take  are  in  different  order  so  I'll  have 
to  show  them  both  at  the  same  time,  I'm  afraid.  Fuzzy  set  theory,  you 
could  argue,  is  not  concerned  with  uncertainty.  It  therefore  does  not 
claim  to  be  a  contender  for  these  numbers  that  are  supposed  to  represent 
uncertainty.  It's  describing  some  different  human  perception. 

Now  you  may  say  that  you  can't  understand  what  perception  it  is 
it's  describing,  but  in  my  view,  it  is  a  perfectly  valid  position  to 
take  that  it's  worth  thinking  of,  there  being  a  different  perception  to 
uncertainty,  namely  imprecision  and  that  fuzzy  set  theory  may  be  an 
appropriate  thing  for  describing  imprecision. 

I'd  like  to  finish  with  this  last  point  that  Dennis  made  so 
forcibly  about  the  sense  of  adopting  Ockham's  Razor,  which  I  share.  I 
think  we  all  share. 

I  would  argue  that  conversely  the  application  of  probability 
actually  leads  to  enormous  complexity  and  that  what  we  need  is  a  theory 
which  leads  to  simpler  representations  than  is  provided  by  probability 
theory. 


David  Schum  of  Rice  University  has  done  quite  a  bit  of  work  on 
the  application  of  probability  reasoning  in  legal  contexts.  He's  done  a 
very  nice  paper  which  analyzes  the  famous  legal  case,  that  of  Salmon's 
pills,  that  of  whether  Salmon's  pills  killed  somebody  or  not.  I  advise 
you  to  go  and  read  the  papers  if  you  want  a  good  analysis  of  a  very 
simple  legal  inference. 

Now  he  applied  a  form  of  Bayesian  thinking,  of  probabilistic 
thinking,  to  that  case  and  concluded  that  the  number  of  probability 
judgments  that  one  needed  to  make  in  order  to  come  to  some  sensible 
conclusion  was  beyond  all  reason. 

Not  only  that,  but  there  was  not  one  but  a  large  number  of 
different  ways  one  might  set  about  structuring  the  problem. 

I  therefore  think  that  if  you're  going  to  adopt  Ockham's  Razor, 
and  I  do,  you  might  be  led  elsewhere  than  to  probability  theory.  Thank 
you. 

DR.  DeGROOT :  Dennis,  do  you  want  to  respond  to  those  comments 
before  we  move  on? 

DR.  LINDLEY:  No. 

DR.  DeGROOT:  Glenn,  do  you  have  comments  that  you  would  like  to 
make,  or  do  you  want  to  wait  and  see  how  the  discussion  goes? 

DR.  ZADEH:  1  have  a  comment.  I  think  that  one  cannot  really 
take  issue  with  the  conclusions  that  Dennis  drew  from  the  particular 
example  that  he  considered  involving  scoring,  but  1  think  that  one  can 
question  the  big  jump  from  whatever  conclusions  that  one  can  draw  from 
that  example  to  the  much  more  sweeping  statement  concerning  the 
inevitability  of  probability  theory.  I  think  there  is  a  big  gap  there. 


160  - 


I  hated  to  get  Into  it  because  I  have  to  figure  out  if  what  I'm 
going  to  say  next  is  a  repetition  of  what  I've  said  already. 

(Laughter. ) 

In  the  first  place,  even  in  that  example  the  assumption  is  that 
something  is  either  true  or  false,  but  what  happens  if  your  forecast  is 
such  that  it's  a  matter  of  degree? 

For  example,  the  forecaster  says  it  will  be  a  rainy  day  and  it  is 
rainy  to  a  degree,  or  says  it's  a  warm  day  and  it's  warm  to  a  degree.. 

How  will  that  influence  the  situation?  That's  one  point. 

Another  point  is  this.  I  think  that  the  best  way  of  resolving 
these  Issues,  as  I've  said  already,  is  to  consider  particular  problems, 
such  as,  if  I  say  most  students  are  young,  and  then  I  say  most  students 
are  healthy,  with  the  understanding  that  young  and  healthy  both  are 
fuzzy  predicates,  and  then  I  ask  the  question  "what  fraction  of  students 
are  healthy  and  young?" 

I  would  like  Prof.  Lindley  to  come  up  with  a  probabilistic 
analysis  of  this  sort  of  a  thing,  not  at  this  point  of  course  but  at 
some  point. 

I  think  it  was  very  interesting  to  see  how  it  could  be  done. 
Personally,  I  think  there  would  be  very  great  complications,  if  it  can 
be  done  at  all,  whereas  using  fuzzy  logic  it's  a  very  simple  sort  of  a 
thing  and  you  get  an  immediate  answer, 

DR.  LINDLEY:  My  response  to  that  is  if  I  have  your  simple  answer 
I'll  show  you  that  the  probability  one  is  better. 

DR.  DeGROOT:  Are  there  other  comments? 

DR.  WISE:  I  address  Mr.  Watson's  point  about  probability  leading 
to  complexity,  and  that  is  if  you  describe  a  problem,  like  Professor 
Zadeh  did,  and  then  you  work  it  with  probability  you  need  to  make  a 
series  of  assumptions  to  get  some  answer. 

If  you  work  it  with  something  else,  you  may  get  a  very  similar 
answer  and  I  think  the  fact  that  you  had  to  make  the  assumptions  in 
probability  to  get  essentially  the  same  answer  indicates  that  you  have 
implicitly  made  them  using  the  other  theory.  The  mere  fact  that  you  get 
an  answer  at  all  indicates  that  you  had  made  some  assumption  Implicitly 
and  you  could  uncover  it  by  doing  it  probabilistically,  assuming 
whatever  is  necessary  to  get  the  same  answer. 

DR.  LINDLEY:  I  think  what  you  said  is  absolutely  right.  I  met 
an  example  recently  in  which  an  argument  had  been  used  and  it  turned  out 
to  be  a  perfectly  sound  argument  from  a  probability  point  of  view,  but 
an  assumption  had  been  introduced  at  one  point  and  you  had  to  say  to 
yourself,  was  that  assumption  reasonable? 


The  argument  went  through  without  this  assumption  being  exposed. 
As  soon  as  you  did  it  by  probability  you  realized  an  assumption,  and  you 
had  to  say  is  that  assumption  reasonable?  In  actual  application  it  was 
reasonable  and  so  the  answer  was  all  right.  If  I  understand  what  you're 
saying,  I  absolutely  agree  with  you.  The  probability  argument  exposes 
what  you  Jiave  to  assume. 

DR.  DeGROOT:  I'd  like  to  comment  on  your  standard  example  for 
probability.  Let  me  say  that  I,  like  Art  Dempster,  agree  with  almost 

everything  you  say  —  with  everything  in  spirit  -  and  only  disagree 

in  detail  as  to  those  here  and  there. 

Your  standard  example  for  probability  is  that  I  evaluate  or 
assess  the  probability  of  an  event  A  by  asking  would  I  prefer  to  receive 
a  prize  contingent  on  A  or  contingent  on  some  black  ball  being  drawn 
from  an  urn  of  known  composition. 

I  don't  like  that  way  of  assessing  probability.  1  think  about 
the  event  A  being  nuclear  war  tomorrow.  Would  1  prefer  to  receive  $5 
contingent  on  nuclear  war  tomorrow  or  contingent  on  drawing  a  black  ball 
from  some  box,  no  matter  how  rare  that  black  ball  might  be  in  that  box, 
and  I  think  I  would  probably  go  for  the  black  box.  I  don't  think  five 
bucks  is  going  to  do  me  much  good  tomorrow.  It's  not  enough  to  even  get 
out  of  town,  really. 

The  traditional,  the  classical  phrase  is  ethical  neutrality  of 
the  events  A  that  we  can  assess  in  these  terms,  but  many  events  are  not 
ethically  neutral  to  us,  and  I  think  there  are  other  ways  —  I  still  am 
a  believer  in  probability,  and  I  think  there  are  other  ways  to  assess 
the  probability.  I  think  there's  a  question  there  somewhere. 

DR.  LINDLEY:  I  don't  see  the  ball  and  urn  method  as  a  sensible 
way  of  assessing  probabilities  any  more  than  I  think  it  will  be  sensible 
for  us  to  get  a  van,  and  cart  this  desk  out  to  the  National  Bureau  of 
Standards  and  place  it  next  to  the  standard  yard  and  see  how  long  it  is. 

We  don't  do  things  that  way.  We  don't  compare  them  with  the 
standards.  We  use  other  techniques,  and  I'm  sure  we  have  to  use  other 
techniques  for  assessing  probabilities.  My  point  was  there  is  a 
standard  for  probability.  I  don't  think  you  would  use  it  any  more  than 
you've  used  the  standard  for  length. 

DR.  FORMAN:  Ernest  Forman  from  the  Management  Science  Department 
at  George  Washington. 

In  making  these  estimates  of  probability,  you  talked  about  the 
need  for  a  standard  and  you  talked  about  different  types  of  comparisons. 

Why  not  make  your  estimates  in  terms  of  relative  comparisons,  so 
not  only  should  you  put  a  .4  on  this  and  a  .6  on  this  but  this  is  .6 
to  .4  or  three  to  two  ratio  and  then  do  whatever  normal izati'>n  you  have 
to  do  to  invoke  the  laws  of  probability. 


It  seems  like  that's  a  straightforward  way  of  doing  it  and  could 
also  work  in  context  of  the  certainty  factors,  instead  of  Just  putting 
.3  on  it  look  at  all  the  other  alternatives  and  make  Judgments  as  to  how 
likely  you  think  these  things  are  relative  one  to  another. 

DR.  SOLAND:  Professor  Lindley,  I  wonder  if  you  might  give  us  a 
definition  of  uncertainty  that  you  use,  if  perhaps  it's  useful  to  give  a 
definition,  and  also  how  you  interpret  imprecision  and  whether  or  not 
you  contrast  it  with  uncertainty. 

DR.  LINDLEY:  You're  asking  me  about  two  words  in  the  English 
language,  is  that  right? 

DR.  SOLAND:  Yes. 

DR.  LINDLEY:  You're  not  asking  me  about  two  technical  terms 
called  uncertainty  and  imprecision? 

DR.  SOLAND:  Yes,  technical  because  we've  talked  about 
uncertainty  and  lack  of  precision  here. 

DR.  LINDLEY:  Precision  to  me  is  the  inverse  of  variables  of  the 
variance  in  the  technical  sense. 

DR.  SOLAND:  Well,  but  in  the  sense  that  Prof.  Zadeh  has 
quantified  imprecision  how  would  you  deal  with  that? 

DR.  LINDLEY:  Well,  if  I'm  imprecise  about  the  value  of 
Berkeley's  population,  it  seems  to  me  to  mean  almost  essentially  the 
same  as  I'm  uncertain  about  Berkeley's  population.  I  don't  see  the 
great  distinction  in  the  English  language.  All  I  was  doing  in  this  talk 
was  I  was  talking  about  events.  I  didn't  go  into  the  technical 
complexity  of  quantities.  I  was  talking  about  an  event.  An  event  to  me 
is  uncertain  if  I  do  not  know  whether  it  is  true  or  whether  it  is  false. 

For  example,  the  event,  the  millionth  digit  of  it  in  an 
infinitesimal  expansion  is  certain.  For  some  things  I  do  know  whether 
it  is  true  or  false,  though  its  logical  truth  or  falsity  follows  from 
what  things  I  do  know,  but  I  don't  know  and  I  haven't  done  the 
calculation,  so  to  me  that  is  an  uncertain  event,  and  I  will  give  it 
probability  of  .1. 

DR.  SPIEGELHALTER:  You  assume  that  all  your  events  are  going  to 
be  able  to  be  scored  in  the  future,  that  they  are  potentially  verifiable 
propositions. 

Do  you  think  there  is  any  role  for  propositions  that  aren't 
strictly  verifiable?  Let  me  give  an  example  of  something  that  may 
appear  in  expert  systems  for  statistics.  Say,  that  there  may  be  a  node 
corresponding  to  an  assumption  of  normality  in  the  data,  or  linearity  in 
regression,  and  this  for  control  purposes  may  be  an  important  thing  to 
establish  and  some  idea  of  the  compatibility  of  the  data  with  that 
assumption  is  Important,  but  it's  not  perhaps  something  that  you  might 
like  to  give  a  probability  to. 


I  was  wondering  if  you  would  say  how  you  might  deal  with 
propositions  that  aren't  strictly  verifiable. 

DR.  LINDLEY:  Yes.  that's  a  very  Important  point.  When  we  have 
to  do  something,  then  of  course  the  things  that  we're  concerned  with 
typically  are  things  that  can  be  verified  but  it  turns  out  that  when  we 
do  the  calculations  it  is  very  convenient  to  bring  in  things  whose  value 
can  never  be  verified. 

The  simplest  example  is  the  standard  situation  in  statistics  in 
which  we  have  a  number  of  random  variables  that  are  independent  and 
Identically  distributed. 

Now  we  think  about  the  situation  by  bringing  in  things  called 
parameters.  Nobody  ever  knows  what  these  parameters  are  —  nobody  is 
ever  going  to  observe  them,  but  to  bring  them  in  is  enormously 
simplifying  in  the  calculation  of  observable  probabilities. 

For  example,  suppose  I  have  a  sequence,  up  to  ^ 

observe  the  values  and  I  now  want  to  say  something  about  the 

The  simplest  way  for  me  to  do  it  is  to  do  it  through  parameters 
that  I  will  never  observe  but  I  will  observe  Xj^j^  and  this  technique  is 
really  useful. 

1  note  somebody  said  that  my  analogy  with  length  wasn't  perhaps 
right.  Maybe  it's  not  but  here's  another  example.  There  are  plenty  of 
situations  in  which  you  cannot  measure  lengths. 

The  way  you  measure  the  distance  from  A  to  B  is  from  A  to  C  and  C 
to  B.  You  do  it  the  long  way  around  and  there  are  many  situations  like 
that  in  probability  where  you  can't  do  the  thing  directly.  You  have  to 
invent  other  ways  of  doing  it. 

There  are  also  of  course  questions  that  you  can't  verify.  For 
example,  did  Shakespeare  write  Hamlet?  My  personal  probability  that 
Shakespeare  wrote  Hamlet  is  about  .2,  but  nobody  is  ever  going  to  be 
able  to  verify  that,  at  least  it's  most  unlikely,  so  you  say  does  it 
matter? 

Well,  you  have  to  turn  that  into,  say,  the  following  event;  that 
it  will  be  discovered  during  the  next  calendar  year  that  Shakespeare  did 
not  write  Hamlet. 

Now  that's  an  event  that  can  be  tested  and  it's  an  event  that  is 
of  tremendous  Importance  to  the  people  of  Stratford-on-Avon  in  England, 
because  if  it  ever  was  discovered  that  he  didn't,  the  tourist  industry 
would  collapse. 


How  would  you  calculate  the  probability  that  it  will  be 
discovered  in  the  next  year  that  Shakespeare  did  not  write  Hamlet? 

Well,  you  would  have  to  say,  supposing  he  didn't  write  Hamlet,  what  is 
the  probability  that  it  will  be  discovered?  Supposing  he  did  write 
Hamlet,  what  is  the  probability,  and  so  on.  You  have  have  to  order  it 
conditional  on  Shakespeare  wrote  Hamlet. 

In  other  words,  you  bring  in  an  event  that  can  never  be  verified 
in  order  to  calculate  a  very  important  event  that  can  be  verified, 
namely,  it  is  to  be  discovered  during  the  next  year  that  Shakespeare  did 
not  write  Hamlet.  Admittedly,  the  probability  of  that  is  very  small, 
but  still  it's  an  event  of  supreme  importance  to  the  inhabitants  of 
Stratford-on-Avon  and  can  be  verified. 

So  I  think  that  there  are  events  that  can  never  be  verified  but 
are  extremely  useful  for  us  to  introduce  into  our  calculations. 

DR.  SHAFER:  I  Just  wanted  to  push  Dennis  to  address  further  the 
point  about  the  scoring  argument. 

In  the  case  of  the  unverifiable  events,  what  is  the  relevance  of 
the  scoring  argument? 

DR.  LINDLEY:  None  whatsover,  but  it  is  entirely  relevant  to  the 
events  that  can  be  verified. 

DR.  WATSON:  That  was  really  my  point  as  well.  If  I  can  enlarge 
on  it,  if  scoring  doesn't  apply  to  unverifiable  events,  why  should 
probabilities  exist  for  unverifiable  events? 

DR.  LINDLEY:  Well,  because  the  events  that  are  related  to  them 
can  be  scored  and  that  is  enough.  I'm  saying  this  with  some  hesitancy. 
Is  that  right?  My  feeling  is  that  it's  just  like  distance.  If  I 
measure  enough  distances  I  can  infer  those  are  the  distances.  I  think 
it's  the  same  with  probabilities.  If  I  have  enough  events  that  are 
verifiable,  then  I  can  carry  the  argument. 

DR.  DEMPSTER:  Would  you  say,  Dennis,  that  your  personal 
probability  is  .2  about  Hamlet  was  based  on  evidence  of  some  kind  — 
There's  a  kind  of  gulf  between  us  here. 

DR.  LINDLEY:  Oh  yes,  I  have  information.  There  are  several 
pieces  of  evidence.  For  example,  we  do  not  have  any  of  William 
Shakespeare's  handwriting  except  one  signature  which  strongly  suggests 
this  was  a  man  who  found  great  difficulty  in  writing. 

It  is  very  surprising  that  a  schoolmaster  should  have  had  enough 
knowledge  of  Roman  history,  court  behavior  of  the  Tudors  and  things  like 
that  to  have  written  the  plays. 


It's  very  amazing  that  this  should  have  happened  but  on  the  other 
hand  there  is  a  person  around  who  did  Indeed  have  all  that  knowledge  and 
could  well  have  written  it,  and  that  was  the  Earl  of  Oxford.  There  are 
also  other  people  who  also  had  the  knowledge  and  night  have  written  it. 

So  there  is  quite  a  bit  of  this  sort  of  circumstantial  evidence 
that  Shakespeare  was  Just  not  in  the  position  to  have  written  the  play, 
he  didn't  know  enough. 

DR.  DEMPSTER:  These  are  all  pieces  of  evidence  that  point  one 

way. 

DR.  LINDLEY:  Right.  The  piece  of  evidence  that  points  the  other 
way,  of  course,  is  that  he  put  them  all  on. 

DR.  DEMPSTER;  Your  theory  doesn't  make  any  special  distinction 
about  the  direction  in  which  evidence  points. 

DR.  KONG:  Can  I  ask  a  question?  It  seems  to  me  that  in  order  to 
get  that  probability  point  two,  lets  say  I'm  a  Bayesian,  what  I  have  to 
do  is  I  have  to  look  into  the  time  before  Hamlet,  the  piece  of 
literature,  actually  appeared,  and  then  every  human  being  before  that 
can  write  that  piece  of  literature,  so  I  need  a  prior  probability 
distribution  for  every  one  of  them.  Maybe  I  use  the  uniform 
distribution  so  it's  like  one  over  so  many  billion;  and  then  for  each 
one  of  them  I  have  to  have  like  a  likelihood  of  them  writing  Hamlet  and 
then  I  do  all  the  calculations  and  then  come  up  with  the  conditional 
probability  of  Shakespeare  actually  writing  Hamlet,  the  probability  is 
.2,  so  we  actually  need  a  lot  of  numbers.  We  need  like  likelihood  for 
all  the  human  beings  before  Hamlet  actually  appeared,  is  that  right? 

DR.  LINDLEY:  No,  I  don't  think  that's  right.  Let  me  tell  you  a 
story  about  this.  I  was  talking  with  De  Finetti  once,  and  he  said  to  me 
what's  the  matter  with  you  probabilists?  He  says  you're  always  talking 
about  sigma  algebras.  So  I  said,  well  yes,  we  do.  We  suppose  these 
probabilities  actually  are  in  sigma  algebras.  Why,  he  said,  why  should 
I  have  to  think  about  all  the  events  in  the  sigma  algebras?  Why  can't  I 
think  about  some  of  them.  Surely  that's  relevant  here. 

All  I  need  to  do  is  say  I  have  an  individual  called  Shakespeare, 
and  I  am  concerned  with  whether  that  individual  wrote  the  plays.  The 
other  possibility  is  that  someone  other  than  Shakespeare  wrote  them.  I 
don't  see  why  I  have  to  consider  everybody. 

DR.  KONG:  If  I  do  that,  am  I  sort  of  hiding  some  kind  of 
assumptions  because  every  human  being  apart  from  Shakespeare,  at  least 
it's  possible  that  they  did  write  them. 

DR.  LINDLEY:  But  if  you  now  ask  me  the  question  —  you  are  going 
to  find  some  specific  person  who  was  alive,  say  Just  about  the  same  time 
as  Shakespeare,  and  ask  me  what  is  the  probability  that  he  wrote  Hamlet, 
then  Indeed  I  shall  have  to  do  the  sort  of  calculations  you  suggest. 

But  whilst  my  question  is  did  Shakespeare  write  it  or  did  he  not,  I 
don't  see  why  I  have  to  engage  in  this. 


DR.  KONG:  It  all  depends  on  other  possibilities,  because  like  if 
Shakespeare  is  the  only  human  being  that  existed,  of  course  the 
probability  has  to  be  one  because  nobody  else  can  write  it.  Of  course 
it's  very  unlikely  for  him  to  do  It  but  if  he  is  the  only  one  of  course 
he  has  to  be  the  one. 

DR.  LINDLEY:  All  the  possibilities  have  been  eliminated. 

DR.  KONG:  Right,  so  it  seems  to  me  that  in  order  to  get  the 
probability  you  still  have  to  consider  the  alternative.  Of  course  we 
don't  consider  each  individual  by  itself  but  we  are  sort  of  making  some 
kind  of  assumptions  there.  Sort  of  coursening  all  those  Individuals 
together . 

DR.  DeGROOT:  What  you're  stating  is  the  nuisance  parameter 
problem  and  it  is  a  fundamental  problem  in  Bayesian  statistics  and 
Bayesian  methodology  in  general. 

What  I  sense,  Dennis,  in  answering  is  that  in  some  circumstances 
one  can  mentally  if  no  other  way  simply  integrate  out  all  that's  not 
necessary  to  assess  distributions  on  an  entire  high  dimensional  space, 
if  you're  only  going  to  be  interested  in  one  particular  event  and  the 
nature  of  the  Information  that  you  have  makes  it  possible  to  directly 
assess  the  probability  of  that  event. 

DR.  KONG:  The  thing  is,  theoretically  I  do  have  a  prior  and 
theoretically  it  may  be  subjective  but  I  do  have  some  kind  of  likelihood 
I  can  construct  and  1  can  look  at  each  human  being  and  then 
theoretically  I  can  use  the  Bayesian  formula  and  get  the  answer. 

DR.  DeGROOT:  All  that  is  true.  I  agree  with  that.  In  fact 
Dennis  challenges  you  can  do  that.  It  seems  to  me  the  way  he's  bringing 
forth  his  evidence  is  almost  perfect  for  belief  functions.  I  have  no 
idea  at  all  how  to  do  it  Bayeslanly  so  Dennis  has  to  tell  us. 

DR.  KONG:  It  seems  to  me  that  if  it's  like  belief  function  its 
like  .2.  But  if  its  Bayesian  it  seems  I  have  a  prior  I  have  to  like,  I 
must  have  constructed  the  probability  of  .2  out  of  these  two  pieces  of 
evidence.  I  don't  think  anyone  can  state  clear  out  what  is  the  prior 
and  what  is  the  likelihood  of  each  individual  human  being.  It  is  sort 
of  impossible.  It  has  too  many  parameters. 

DR.  SHAFER:  Not  even  too  many.  Tversky  has  done  some  work  where 
he's  gone  through  and  shown  that  people  do  not  maintain  this  coherence 
that  was  mandated  by  Bayesian.  You  give  them  simple  examples  and  ask 
them  for  priors  and  they  won't  agree  with  Bayesian  statistics. 

How  do  you  capture  these  things?  How  do  you  get  these  numbers? 
People  don't  think  in  these  terms. 


OR.  DeCROOT:  I'll  let  you  handle  the  hard  ones,  Gentlenen.  I'm 
Just  the  Btoderator  here. 

DR.  LINDLEY:  My  only  difficulty  in  answering  that  question  is 
that  you  wouldn't  let  me  talk  for  an  hour  and  a  half  on  it. 

The  first  thing  is  that  it  seems  to  me  preposterous  to  expect 
people  to  do  these  things  when  they  haven't  had  any  training  in 
probability. 

To  expect  people  to  be  able  to  do  this  sort  of  thing  without 
training  in  probability  does  seem  to  me  to  be  ready  rather  absurd,  so  my 
attitude  to  all  these  Tversky  results  is  to  say,  well,  yes,  I'm  not 
surprised,  how  could  they  succeed  otherwise. 

The  other  point  is  of  course  this  is  not  a  descriptive  theory. 

It  is  a  normative  theory.  It's  how  you  would  wish  to  behave  if  only  you 
could  do  it. 

Well,  it  may  be,  it  may  be,  that  after  we've  worked  with  this  for 
20  years  we  discover  that  you  Just  can't  do  it.  Maybe  so,  and  it  would 
be  a  great  pity,  but  I  think  we  ought  to  try. 

Just  imagine  that  we  weren't  in  19S4,  we  were  in  1684  and  that 
Isaac  Newton's  theory  was  hot  off  the  press,  Newtonian  mechanics. 

It  would  be  absurd  to  have  turned  around  to  Newton  and  said,  "Oh, 
your  theory  is  absolutely  useless  because  we  can't  measure  masses  and 
accelerations  and  these  things  accurately  enough  to  do  the  Job." 

What  you  did  was  to  develop  ways  of  measuring  those  things  and  I 
think  that  the  logic  here  is  so  strong  that  what  we  now  need  to  do  is  to 
put  an  investment  into  whether  these  things  can  be  measured. 

To  expect  a  human  being  to  be  able  to  do  it  without  any  training 
at  all  seems  to  me  to  be  rather  unsatisfactory. 

DR.  LINDLEY:  I  am  delighted  by  the  fact  that  people  can't  do 
this.  If  they  could  do  all  this  thing  naturally,  then  I  would  be  out  of 
a  Job. 

(Laughter. ) 

It  is  because  they  cannot  do  these  things  that  we  probablllsts 
have  a  potentially  marvelous  tool.  People  cannot  do  these  things  so  now 
we  have  something  to  help  them. 

I  think  that  is  great.  I  think  this  is  one  of  the  greatest 
things  of  the  20th  century.  If  they  could  do  it  naturally  then  1 
wouldn't  be  here. 


PROBABILISTIC  EXPERT  SYSTEMS  IN  MEDICINE:  PRACTICAL  ISSUES 
IN  HANDLING  UNCERTAINTY 


1 .  INTRODUCTION 


The  first  problem  in  discussing  "uncertainty  in  expert  systems" 
comes  in  defining  our  terms.  We  shall  take  a  fairly  narrow  view  of 
'expert  systems'  and  consider  only  programs  that  contain  a 
'knowledge-base'  of  interrelated  propositions,  represented  in  such  a  way 
as  to  be  usable  by  an  'inference  engine'  to  make  some  type  of  human-like 
judgment  concerning  'data'  from  a  new  individual.  Generally,  such  a 
system  is  hoped  to  be  able  to  justify  its  judgments  in  a  manner 
comprehensible  to  the  user,  to  allow  updating  of  its  knowledge  base  in 
response  to  experience,  and  to  be  based  largely  on  the  expressed  opinion 
or  observed  practice  of  one  or  more  'experts. '  We  shall  concentrate 
specifically  on  'diagnostic'  systems  whose  aim  is  to  weigh  evidence 
concerning  possible  outcomes  whose  status  is  currently  unknown  -  this 
includes  systems  for  prediction  as  well  as  the  more  standard 
classification  programs. 

Within  this  context  the  term  'uncertainty'  is  commonly  used  in  a 
broad  spectrum  of  qualitative  ways:  for  example,  to  describe 
incompleteness  in  the  knowledge  base  ("I  don't  know  what  we  should  think 
if  X  occurs"),  to  describe  doubt  about  the  structure  of  the  knowledge 
base  ("I'm  not  sure  whether  it  is  reasonable  to  assume  X  and  Y  are 
Independent")  to  qualify  logical  implication  ("X  — >  Y  with  certainty 
P"),  to  describe  imprecision  about  the  qualification  ("I'm  not  sure  what 
P  should  be"),  to  describe  ignorance  concerning  the  current  individual 
("I've  no  idea  whether  X  is  true  or  not"),  and  even,  occasionally,  to 
describe  the  probability  that  a  proposition  is  true.  'Uncertainty'  has 
also  been  used  in  describing  the  extent  to  which  a  proposition  is  true, 
although  this  is  an  area  which  appears  to  fall  within  fuzzy  reasoning. 
There  is  often  an  interpretation  in  terms  of  degree  of  support  for  a 
hypothesis,  expressing  the  matching  or  compatibility  of  the  observations 
with  those  expected  were  a  hypothesis  true.  Each  of  these  ideas  has 
been  discussed  from  different  professional  perspectives,  and  the  view 
that  any  deviation  from  a  self-contained  logical  system  is  'uncertainty' 
has  led  to  much  general  discussion  of  multi-valued  logics  of  which 
probability  is  only  one  example.  In  this  paper  it  is  argued  that 
probability  theory  does  indeed  have  a  strictly  limited  role,  but  that 
within  these  limits  it  can  adept  many  of  the  desirable  characteristics 
of  methods  adopted  by  others. 

We  shall  concentrate  on  practical,  rather  than  philosophical, 
issues  concerning  the  way  uncertainty  is  handled  in  existing  programs, 
and  do  not  consider  in  detail  either  the  representation  of  knowledge  or 
the  control  of  the  program.  Published  examples  motivate  the  search  for 
a  methodology  that  satisfies  a  number  of  demands,  and  three  current 
projects  will  then  be  used  to  illustrate  some  specific  aspects  of  the 
attempt  to  use  probabilistic  methods  in  as  effective  a  way  as  possible. 
Finally,  an  attempt  is  made  to  bring  the  argument  together  into  a 
prospect  for  future  developments. 


2.  DEMANDS  MADE  OF  A  CALCULUS 


The  particular  complexity  of  many  medical  problems  has  challenged 
the  notion  of  a  rigorous  unified  treatment  of  uncertainty  and,  in 
general,  ad  hoc  quantifications  have  been  used  to  measure  evidence  for 
various  possible  underlying  hypotheses  (Szolovits  and  Pauker,  1978). 

The  complex  interrelatio/iships  between  disease  processes  and 
manifestations  has  led  to  various  systems  for  propagating  degrees  of 
certainty  and  combining  evidence  from  different  sources  -  PIP  (Pauker  et 
al,  1976)  and  INTERNIST/CADUCEUS  (Miller  et  al.  1982)  both  essentially 
score  hypotheses  using  evidence  from  current  symptoms  that  support  a 
hypothesis,  which  is  discounted  by  a  score  expressing  absent  symptoms 
that  would  be  expected,  and  a  score  expressing  present  symptoms  that 
would  not  be  expected.  MYCIN/EMYCIN  use  a  more  modular  structure  in 
which  'certainty  factors'  are  propagated,  while  CASNET/EXPERT 
(Kulikowski  and  Weiss,  1982)  propagates  'weights'  through  a  causal 
network.  A  statistical  system  such  as  that  of  de  Dombal  et  al,  (1972) 
begins  with  'knowledge'  derived  from  a  data  base,  but  the  simplistic 
independence  assumptions  made  in  combining  evidence  (although  effective 
in  discrimination)  ensure  that  the  'certainty'  propagated  is  not 
expected  to  be  interpretable  as  a  probability  -  the  same  holds  for  the 
'Bayesian'  updating  technique  in  PROSPECTOR  (Duda  et  al,  1976).  Fuzzy 
reasoning  (Adlassnig,  1980,  Fieschi  et  al,  1983)  has  also  been  used  as  a 
means  of  capturing  the  ill-defined  nature  of  many  clinical  terms. 

We  can  identify  a  number  of  considerations  that  have  led  to  the 
procedures  that  have  been  adopted  and  that  are  currently  being 
researched.  Strongest  has  been  the  claim  that  a  single  probability  of  a 
hypothesis,  even  if  it  were  based  on  extensive  data,  is  not  sufficient 
to  convince  a  clinician:  the  evidence  on  which  to  base  a  conclusion 
must  be  retrievable,  to  enable  conflicts  and  doubtful  contributions  to 
be  identified.  A  particular  case  of  this  demand  for  justification  is 
the  situation  where  little  relevant  data  is  available  and  there  is 
essentially  ignorance  concerning  the  possibility  of  a  hypothesis.  This 
arises  particularly  often  in  medicine  due  to  the  hierarchical,  taxonomic 
structure  of  disease  descriptions  in  which  evidence  may  be  available 
which  supports  a  general  disease  category  but  gives  no  indication  of  the 
relative  plausibility  of  the  sub-categories  of  disease.  Thus  the 
hierarchical  hypothesis  structure  is  viewed  as  a  natural  justification 
for  ranges  of  uncertainty,  for  which  a  number  of  schemes  exist  (see,  for 
example,  Quinlan  (1982)).  The  demand  that  individual  contributions  of 
pieces  of  evidence  should  be  identified,  and  that  evidence  should  be 
able  to  focus  on  groups  of  diseases  without  distinguishing  within  that 
group,  has  led  naturally  to  the  study  of  the  possible  role  of  belief 
functions  in  medicine  (Gordon  &  Shortliffe,  198^).  Much  attention  is 
now  being  paid  to  solving  the  accompanying  computational  problems  and 
making  some  allowance  for  dependencies  between  sources  of  evidence.  The 
concept  of  'discounting'  in  belief  functions  could  also  be  seen  as  a 
means  of  allowing  for  doubt  about  the  precise  numbers  to  be  placed  on 
evidential  statements. 


To  summarize:  current  interest  is  focused  on  schemes  that  can 
propagate  measures  of  uncertainty  through  complex  relationships  often 
defined  on  a  hierarchical  structure,  that  can  identify  conflicting 
evidence  and  lack  of  evidence,  and  can  cope  with  incoming  data  that  do 
not  follow  a  pre-defined  order.  The  reasoning  process  should  be 
justifiable  and  fairly  intuitive,  and  allowance  for  imprecise 
specification  of  numerical  relationships  would  be  an  advantage. 

While  the  above  desiderata  appear  admirable,  we  feel  there  is  an 
important  item  that  has  been  largely  ignored  in  practice.  This  concerns 
the  operational  meaning  of  the  quantities  which  express  uncertainty.  In 
the  following  examples  we  describe  attempts  to  retain  'meaning'  while 
responding  to  demands  and  constraints  made  by  the  real  practical 
problems  of  interest. 

3.  EXAMPLES  OF  PROBABILISTIC  ANALYSIS 

GLADYS  -  The  GLAsgow  DYSpepsia  system 

GLADYS  is  a  program  designed  to  interview  patients  presenting  to 
a  clinic  with  dyspepsia,  and  provide  a  reasoned  probabilistic  diagnosis 
based  on  the  symptoms  alone.  It  was  developed  at  the  Diagnostic 
Methodology  Research  Unit  at  Glasgow,  and  runs  on  a  microcomputer  with  a 
special  keyboard  to  record  patient  responses.  The  control  of  the 
interview  is  strictly  algorithmic,  in  that  branches  to  more  detailed 
interrogation  are  taken  depending  on  the  results  to  'trigger  questions,' 
and  the  interview  has  been  found  to  be  accurate  and  acceptable  (Lucas  et 
al,  1976).  The  responses  are  analyzed  according  to  a  scoring  system 
derived  from  a  modified  logistic  regression  technique,  described  in 
detail  in  Spiegelhalter  and  Knlll-Jones  (198^1),  of  which  certain  aspects 
are  relevant  to  the  issues  raised  in  the  previous  section. 

Firstly,  there  is  a  real  need  to  deal  with  hierarchical  disease 
structures,  in  which  for  example,  certain  features  may  discriminate  the 
generic  class  'peptic  ulcer'  (PU)  from  other  diseases,  while  other  items 
of  information  are  relevant  to  discriminating  duodenal  from  gastric 
ulcer  (GU)  within  the  peptic  ulcer  class.  This  is  accomplished  by 
calculating  probabilities  conditional  on  the  branch  in  the  hierarchy  and 
then  multiplying  downwards  to  obtain  the  overall  probability:  for 
example,  we  calculate  p(GU|PU)  and  p(PU)  from  which  p(GU)  = 
p(GU|PU)p(PU). 

Secondly,  the  scoring  system  allows  explanation  of  the  final 
probability  in  terms  of  the  contributing  pieces  of  evidence.  For 
example,  a  patient  described  in  Spiegelhalter  and  Knill~Jones  (198^4) 
provided  the  following  evidence  relevant  to  a  diagnosis  of  gallstones: 


/•y.-  J 


Evidence  FOR  gallstones  Evidence  AGAINST  gallstones 


History  less  than 
6  months 
Pain  comes  in 
'attacks' 

Can  enumerate 
attacks 

Attacks  produce 
restlessness 
Pain  in  right 
hypochondrium 


Pain  not  severe  enough  to 
warrant  emergency 
call  to  doctor  -^3 

Pain  does  not  radiate  -38 


ii 


•  A. 

.V.V.Vl 

:.v%‘ 

*  V  V  n 


Balance  of 
evidence 


+3^^  (Total  evidence  * 
conflict  ratio 


^425  +  81  =  506; 
506  =  1 .5) 

pnr 


Initial  score 


Final  score 


-300  (Corresponding  to 

prevalence  of  ^.1%) 

-  61 J  chance  of  gallstones 


Some  explanation  of  the  above  'explanation'  is  necessary.  The  scores 
given  to  findings  are  100  log  (likelihood  ratios)  adjusted,  roughly 
speaking,  for  correlations  between  items  of  information.  Thus  the 
initial  score  of  S  -  -300  is  transformed  to  a  prior  probability  p  «  1/(1 
+  exp(-S/100)}  «  .0447,  which  is  simply  the  inverse  of  S  »  100  log  {p/(l 
-  p)).  The  'conflict  ratio'  is  a  rough  measure  of  how  much  the  total 
evidence  obtained  contradicts  itself  :  a  high  ratio,  say  above  around 
2.5,  suggests  the  clinician  should  check  some  of  the  important 
questions.  The  initial  score  is  based  on  a  prevalence  in  an  urban 
clinic  and  could  be  altered  depending  on  circumstances.  The  scores  come 
from  analysis  of  a  data  base  of  1200  cases  and  the  statistical  modeling 
means  the  final  probabilities  are  reasonably  'well-calibrated',  in  that 
of  patients  presenting  as  above,  around  60J  should  turn  out  to  have 
gallstones  as  a  major  cause  of  their  symptoms.  There  is,  however,  no 
reason  why  the  scores  should  not  be  subjectively  assessed  provided  one 
could  ensure  the  predictions  had  similar  properties  of  calibration. 

Thirdly,  imprecision  of  the  quantification  could  be  incorporated 
by  placing  standard  errors  on  the  predictions;  the  above  example  has  a 
standard  error  of  ^2  on  the  final  score  corresponding  to  a  rough  95? 
interval  of  (.4;o,  .78)  on  the  predictive  probability.  Finally, 
ignorance  may  be  viewed  'retrospectively'  in  terms  of  the  'total 
evidence'  received  either  for  or  against  a  proposition.  However,  as 
suggested  in  Spiegelhalter  and  Knill-Jones  (19844),  we  may  also  quantify 
'prospective  ignorance'  in  terms  of  the  results  that  may  occur  when  the 
data  of  which  we  are  currently  ignorant  becomes  available.  This  concept 
translates  into  calculating  the  predictive  distribution  of  the  possible 
final  probabilities  that  may  be  ascribed  to  a  disease.  For  example, 
before  an  interview  starts.  Figure  1  displays  the  distribution  of  final 
scores  for  gallstones  among  those  with  and  without  gallstones.  Tukey 


c-v::*-; 


MS 


i.'« 


-  173  - 


■  /  V  ^ 


(198*4)  recommended  that  such  distributions  should  be  included  as  part  of 
the  explanation  facilities. 


Thus  before  an  interview,  a  patient  has  a  fairly  precise  probability  of 
gallstones  (95)t  interval  .03,  .07)  but  one  based  on  an  ignorance 
reflected  in  the  wide  distribution  of  feasible  probabilities  that  could 
be  taken  on  when  data  become  available;  while  at  the  end  of  the 
interview,  there  is  a  relatively  imprecise  probability  with  a  95$ 
interval  of  (.*40,  .78),  but  no  remaining  ignorance  within  the  bounded 
context  of  the  system. 

We  would  not  normally  consider  GLADYS  as  an  'expert  system'  since 
it  does  not  use  knowledge  representation  techniques  derived  from  AI,  it 
is  not  based  on  'expert  opinion'  and  it  does  not  operate  interactively. 
However,  many  of  our  aims  match  those  of  'classic'  expert  systems, 
except  that  we  are  determined  to  remain,  as  far  as  possible,  within  a 
probabilistic  framework. 

A  diagnostic  system  for  chest  diseases 


A  group  at  the  Chest  Clinic,  Westminster  Hospital  are  developing 
a  system  for  probabilistic  diagnosis  of  patients  presenting  with  a 
normal  chest  X-ray.  The  system  uses  simple  Independent  Bayes  updating 
assuming  mutually  exclusive  disease  categories,  and  our  only  concern 
here  is  with  the  subjective  probability  assessments  on  which  the  system 
is  initially  based.  The  consultant  physician  has  been  required  to 
assess  prior  probabilities  for  each  of  the  diseases  conditional  on  the 
age  group  of  the  patient  and  the  main  presenting  symptoms,  as  well  as 
the  probabilities  of  the  secondary  symptoms  conditional  on  each  of  the 
diseases.  Around  each  probability  he  was  required  to  place  an  interval 
reflecting  his  confidence  in  the  point  probability.  By  viewing  this 
range  as  an  approximate  90$  interval  around  a  binomial  probability  one 
can  derive  a  rough  'implicit  sample  size'  on  which  his  judgment  of  each 
probability  has  been  based.-  These  measures  of  imprecision  are  currently 
not  propagated  through  the  consultation,  although  Rauch  (198*4)  suggests 
ad  hoc  methods  of  doing  this  while  allowing  for  correlated  judgments. 
However,  the  implicit  sample  sizes  allow  the  probabilities  to  be  stored 
as  a  fraction  r/n,  and  where  a  confirmed  case  with  the  relevant  symptom 
is  found  the  probability  may  be  updated  to  (r  *  1)/(n  1).  This 

emphasizes  that  probabilistic  systems  may  be  based  on  subjective 
opinion,  and  yet  a  rational  means  of  allowing  that  opinion  to  learn  from 
experience  is  easily  available. 

IMMEDIATE  -  a  system  for  general  practice 


In  contrast  to  GLADYS,  IMMEDIATE  is  a  rule-based  AI  system 
written  in  PROLOG  which  is  being  developed  by  a  group  centered  at  the 
Medical  Computation  Unit  at  the  University  of  Manchester.  It  is 
designed  to  support  certain  activities  of  general  practitioners  and  its 
control  philosophy  is  described  elsewhere  (Dodson  &  Rector  198*4). 


174  - 


Two  aspects  of  its  development  are  of  interest  here.  Firstly, 
although  the  knowledge  structure  and  uncertainty  propagation  bears  some 
resemblance  to  that  of  PROSPECTOR,  a  deliberate  aim  is  that  the 
probabilities  should  be  made  to  'cohere':  thus  initial  probability 
judgments  should  form  a  valid  joint  distribution,  and,  as  data  arrives, 
uncertainty  be  propagated  in  a  way  that  retains  its  interpretation  as 
subjective  probability.  Secondly,  part  of  the  control  mechanism  is 
based  on  a  range  of  'ignorance'  or  'evidence  availability'  that  is  an 
explicit  calculation  of  the  maximum  and  minimum  probabilities  of  a 
proportion  that  could  be  achieved  when  further  information  becomes 
available.  This  may  be  seen  as  a  summary  measure  of  the  predictive 
distributions  of  final  probabilities  described  under  GLADYS.  Explicitly 
calculating  the  range  of  potential  probabilities  of  a  proposition  helps 
towards  an  assessment  of  the  importance  of  establishing  relevant  patient 
characteristics,  which  in  turn  ensures  that  the  clinician  is  informed  as 
to  the  most  telling  questions  to  ask. 

14.  DISCUSSION 

The  preceding  section  is  an  inadequate  indication  of  the  work 
currently  being  carried  out  in  probabilistic  systems,  and  we  have  only 
been  able  to  mention  aspects  according  to  their  capacity  to  illustrate 
the  practical  implementation  of  important  issues  in  the  handling  of 
uncertainty.  In  this  section,  we  attempt  to  summarize  these  issues, 
with  the  aid  of  examples  drawn  from  the  systems  introduced  above. 

Status  of  Propositions 


It  is  clearly  preferable  that  all  propositions  in  a  system  are 
crisply  defined  and,  at  least  theoretically,  verifiable  at  some  point  in 
the  future,  as  required  by  Smith  (1965)  or  de  Finetti  (197^). 
Nevertheless,  the  inevitable  imprecision  of  statements  (e.g.  "the  pain 
is  relieved  by  food")  makes  it  tempting  to  allow  degrees  of  truth  of 
propositions  and  adapt  a  fuzzy  calculus.  It  should,  however,  be 
emphasized  that  it  is  not  the  true  state  of  the  world  to  which  the 
system  has  access,  but  the  assertion  of  the  state  of  the  world  (The 
patient  has  replied  YES  to  the  question  "Is  the  pain  relieved  by 
food?"),  and  this  is  necessarily  made  'crisp'  by  the  restricted  means 
one  has  to  put  information  into  the  system  (e.g.  just  a  YES/NO  button). 
An  expert  system  can  therefore  force  the  user  to  be  categorical  in  his 
assertions,  although  we  acknowledge  that  user  demand  for  qualifications 
of  'degree'  may  create  the  need  for  an  alternate  calculus  to  deal  with 
' partly-true '  propositions. 


.-a; 

'■-•‘vS'' 


A  statistician  may  tend  to  view  a  knowledge-base  as  a  set  of 
related  'nodes',  each  corresponding  to  a  random  variable  which  may  take 
on  a  number  of  mutually  exclusive  and  exhaustive  values.  The  'rules' 
attempt  to  define  a  distribution  on  the  variables.  For  control 
purposes,  however,  it  may  be  necessary  to  have  'action'  nodes  which 
correspond  to  conclusions  on  which  further  analysis  is  conditioned. 

These  may  well  not  be  strictly  verifiable  propositions;  for  example,  in 
a  system  designed  for  statistical  analysis,  there  may  be  assertions  of 
'normal  errors'  or  'linear  relationship*.  Strictly  speaking  a 
decision-theoretic  argument  should  be  used  for  any  interim  decision  made 


-  175 


in  the  control  of  a  consultation,  but  this  is  not  usually  practicable. 
Instead,  it  may  well  be  reasonable  to  adopt  a  calculus  of 
’compatibility'  or  'degree  of  support'  for  a  hypothesis  for  which  a 
probability  is  not  well-defined. 

Knowledge  Representation  and  Explanation 

We  feel  that  probabilistic  methods  can  handle  hierarchical 
taxonomic  structures  without  extending  into  belief  function  methodology. 
There  is,  however,  a  great  need  for  further  work  in  coherent  assessment 
and  propagation  of  probabilities  through  the  network  structures  arising 
from  rule-based  systems.  The  graphical  representations  of  certain 
log-linear  models  described  by,  for  example,  Wermuth  &  Lauritzen  (1983) 
appear  to  be  relevant,  with  propagation  schemes  extended  from  those  of 
Kim  &  Pearl  (1983).  Subjective  judgments  may  be  deliberately 
over-specified  to  allow  for  identification  of  incoherence  due  to  poor 
assessments  or  weak  modeling,  or  underspecified  and  'padded  out'  using, 
for  example,  the  maximum  entropy  methods  of  Cheeseman  (1983)  and 
Konolige  (1982).  Using  such  a  structure  and  explanation  facilities 
similar  to  GLADYS,  one  should  be  able  to  fulfill  the  aim,  described  by 
Dempster  (1985)  of  justifying  quantified  judgment  explicitly  in  terms  of 
the  sources  of  evidence. 

Intervals  and  Probabilities 

As  we  emphasized  in  discussing  GLADYS,  two  types  of  range  of 
probability  must  be  distinguished.  The  first,  due  to  inadequacies  in 
the  knowledge  base,  concerns  the  imprecision  in  the  quantifications. 

This  may  be  represented  by  a  standard  error  or  even  a  fuzzy  qualifier  in 
the  manner  suggested  by  Freeling  (1981),  but  in  either  case  the  range 
represents  a  type  of  automatic  sensitivity  analysis  conditional  on  the 
data  already  obtained.  This  interval  tends  to  widen  as  more  data  come 
in. 

This  should  be  contrasted  with  an  interval  based  on  ignorance  concerning 
the  current  case,  and  one  way  in  which  this  can  be  defined  is  in  terms 
of  the  probabilities  that  could  be  taken  on  when  the  unknown  data, 
denoted  X,  b^nomes  available.  If  D  represents  a  disease  with  current 
probability  ^xD),  then  the  predictive  distribution  of  the  eventual 
probability  p(DjX)  may  either  be  fully  calculated  as  in  GLADYS  or 
summarized  by  its  range  as  in  IMMEDIATE.  We  note  that 

E[p(D|X)]  =  /p(d|X)p(X)dX 

=  /p(X  |D)p(D)dX  by  Bayes  theorem 
p(D) 

Hence  our  current  probability  may  simply  be  thought  of  as  the  mean  of 
the  distribution  of  possible  final  probabilities.  This  distribution 
narrows  as  the  consultation  proceeds. 


T 


In  this  way  ignorance  is  explicitly  defined  in  terms  of  the  X 
that  we  do  not  yet  know.  In  real  life,  X  is  unbounded  and  so  such  a 
calculation  is  unreasonable,  but  it  is  important  to  note  that  an  expert 
system  is  bounded  and  so  can  always  explicitly  state  what  information  is 
missing,  provided  a  suitably  efficient  search  routine  is  available. 

Operational  Meaning 

Our  practical  experience  has  strongly  influenced  us  towards 
establishing  operational  meaning  to  our  expression  of  uncertainty.  This 
has  three  stages:  firstly,  the  inputs,  based  on  either  real  or 
'imaginary'  past  data,  must  have  sufficient  interpretation  to  allow 
informed  argument.  Clinicians  often  disagree  strongly  about 
frequencies,  but  we  have  found  the  resulting  discussions  illuminating 
and  constructive:  the  problems  of  agreeing  on  numbers  with  no 
verifiable  interpretation  is  vividly  illustrated  in  the  fascinating 
transcript  of  an  argument  concerning  'certainty  factors'  contained  in 
the  recent  book  on  the  MYCIN  projects  (Buchanan  &  Shortliffe,  198^]). 
Secondly,  preserving  operational  meaning  in  the  propagation  of 
uncertainty  requires  attention  to  the  coherence  of  the  assessments  when 
placed  in  a  large,  complex  knowledge-base.  Finally,  the  outputs  need  to 
have  an  externally  verifiable  interpretation  in  terms  of  their 
'calibration'  against  experience.  Such  calibration  is  not  part  of  the 
axioms  of  subjective  probability,  but  we  have  found  an  enthusiastic 
response  from  clinical  colleagues  when  they  find  the  predictions  provide 
reasonable  'betting  odds'.  Of  course,  a  system  may  process  information 
solely  with  the  aim  of  providing  a,  possibly  ranked,  set  of  alternatives 
with  some  attached  measure  of  evidential  support.  However,  if  a  system 
is  to  be  used  to  guide  the  choice  of  an  option,  this  seems  to  be 
inadequate.  In  fact,  a  subjectivist  statistician  may  view  a  diagnostic 
expert  system  as  a  'coherence  machine',  which  takes  in  relevant 
information,  and  throws  out  acceptable  betting  odds  on  future  events. 

Finally,  perhaps  the  most  important  reason  for  interpretable 
quantification  is  the  need  for  learning.  As  we  have  illustrated  with 
the  chest  disease  system,  updating  of  subjective  probabilities  is 
feasible  and  should  provide  a  convergence  of  opinion  that  may  overcome 
local  biases  which  may  otherwise  render  a  system  unacceptable. 


-  177  - 


Freq 


907 


40J 


30-^ 


a 

I  20 


10  ^ 

0 

Score: 

Probability: 


.01  .02  .05  .12  .27.5  .73  .88  .95  .98 


Figure  1.  Empirical  predictive  distribution  of 
final  score  on  gallstones: 

1119  cases  of  ’not  gallstones' 

57  cases  of  gallstones  (shaded) 


-  178  - 


BIBLIOGRAPHY 


Adlassnig,  K.P.  (1980).  A  fuzzy  logical  model  of  computer-assisted 
medical  diagnosis.  Methods  Inf.  Med.,  £,  l^ll-lilS. 

Buchanan,  B.C.  and  Shortliffe,  E.H.  (198^4).  Rule-based  Expert  Systems; 
the  MYCIN  Experiments  of  the  Stanford  Heuristic  Programming  Project. 
Reading:  Addison-Wesley.  - 

Cheeseman,  P.  (1981).  A  method  of  computing  generalized  Bayesian 
probability  values  for  expert  systems.  In  proceedings  of  8th 
International  Joint  Conference  on  Artificial  Intelligence.  Karlsruhe, 
West  Germany,  p.  198-202 

de  Dombal,  F.T.,  Leaper,  D.J.,  Staniland,  J.R.,  McCann  A.P.  and 
Horrocks,  J.C.  (1972).  Computer-aided  diagnosis  of  acute  abdominal 
pain.  Brit.  Med.  J.,  £,  9-13. 

de  Finetti,  B.  (19714).  Theory  and  Probability,  Vol.  1 ,  London,  Wiley. 

Dempster,  A.P.  (1985).  Probability,  evidence  and  judgment.  In  Bayesian 
Statistics  2  (J.  Bernardo  et  al,  eds.)  (to  appear). 

Dodson,  D.C.  and  Rector,  A.L.  (1985).  Importance  -  driven  distributed 
control  of  diagnostic  inference.  In  Research  and  Development  in  Expert 
Systems  (Bramer,  M.A.  ed).  Cambridge  University  Press:  Cambridge, 
England. . 

Duda,  R.O.,  Hart,  P.E.  and  Nilsson,  N.J.  (1976).  Subjective  Bayesian 
methods  for  rule-based  inference  systems.  Proc.  AFIPS.  Nat.  Compt. 
Conf.,  147. 1075-82. 

Fieschi,  M. ,  Joubert,  M.,  Fieschi,  D.,  Botti,  G.  and  Roux,  M.  (1983).  A 
program  for  expert  diagnosis  and  therapeutic  decision.  Medical 
Informatics,  127-135. 

Freeling,  A.N.S.  (1981).  Alternative  theories  of  belief  and  the 
implications  for  incoherence,  reconciliation  and  sensitivity  analysis. 
Decision  Science  Consortium. 

Gordon,  J.  and  Shortliffe,  E.H.  (19814).  The  Dempstei — Shafer  theory  of 
evidence.  In  Buchanan  &  Shortliffe  (19814),  272-292. 

Kim,  J.H.  and  Pearl,  J.  (1983).  A  computational  model  for  causal  and 
diagnostic  reasoning  in  inference  systems.  In  Proceedings  of  8th 
International  Joint  Conference  on  Artificial  Intelligence  Karlsruhe, 
West  Germany,  p.  190-193* 

Konolige,  K.  (1982).  Bayesian  methods  for  updating  probabilities. 

Final  Report,  Project  61415,  SRI  International. 


Kulikowski,  C.A.  and  Weiss,  A.M.  (1982).  Representation  of  expert 
knowledge  for  consultation:  the  CASNET  and  EXPERT  projects.  In 
Artificial  Intelligence  in  Medicine  (Szolovits,  P.  ed)  Colorado: 

Westview  Press  21-55. 

Lucas,  R.W.,  Card,  W.I.,  Knill-Jones,  R.P.,  Watkinson  G.  and  Crean,  G.P. 
(1976).  Computer  interrogation  of  patients.  Brit.  Med.  ^2,  623-625. 

Pauker,  S.G.,  Gorry,  G.A.,  Kassirer,  J.P.  and  Schwartz,  W.B.  (1976). 
Towards  the  simulation  of  clinical  cognition:  taking  a  present  illness 
by  computer.  Amer.  Med. ,  60,  981-986. 

Quinlan,  J.R.  (1983).  Inferno:  a  cautious  approach  to  uncertain 
inference.  The  Computer  Journal  26,  255-269. 

Rauch,  H.E.  (198M).  Probability  concepts  for  an  expert  system  used  for 
data  fusion.  AI  Magazine,  55-60. 

Smith,  C.A.B.  (1961).  Consistency  in  statistical  inference  and  decision 
(with  discussion).  J.  Roy.  Stat.  Soc. ,  23,  1-25. 

Spiegelhalter ,  D.J.  and  Knill-Jones,  R.P.  (198*1).  Statistical  and 
knowledge-based  approaches  to  clinical  decision-support  systems,  with  an 
application  in  gastroenterology  (with  discussion).  J.  Roy.  Stat.  Soc., 

B,  m,  35-77. 

Szolovits,  P.  and  Pauker,  S.G.  (1978).  Categorical  and  probabilistic 
reasoning  in  medical  diagnosis.  Artificial  Intelligence,  1 1 ,  115-1^^. 

Tukey,  J.W.  (198*1).  Discussion  of  Spiegelhalter  and  Knill-Jones  (198*1). 

Wermuth,  N.  and  Lauritzen,  S.L.  (1983).  Graphical  and  recursive  models 
for  contingency  tables.  Biometrika,  70,  537-52. 


-  180  - 


m 


TRANSCRIPT  OF  ORAL  PRESENTATION  BY  DAVID  SPIECELHALTER: 
PROBABILISTIC  EXPERT  SYSTEMS  IN  MEDICINE,  PRACTICAL  ISSUES  IN 

HANDLING  UNCERTAINTY 


;?v 


f  »  •  -  *  V 
l'  •  ■  •  *1 
V 


DR.  SPEICELHALTER:  I  as  ui  appliad  aadlcal  statistician  working 
in  the  MRC  Biostatistics  Unit  in  Caabridge.  But  I  have  a  long  interest 
in  decision  aaking,  since  I  was  reared  and  indoctrinated  at  University 
College,  London  under  Dennis  Lindley  and  Adrian  Smith.  Recently  I  have 
become  involved  in  a  number  of  projects  where  people  have  been 
attempting  to  apply  the  techniques,  or  at  least  some  of  the  ideas,  of  AI 
in  medical  diagnostic  problems. 


S,“  N 

•  "ji 


The  first  problem  in  talking  about  uncertainty  in  expert  systems 
is  defining  expert  systems  and  defining  uncertainty.  And  we  have  had 
quite  a  bit  of  talk  about  uncertainty  yesterday,  but  not  a  lot  really 
about  what  expert  systems  are. 

Firstly  I  want  to  say  what  I  feel  are  the  aims  of  expert  systems 
in  medicine.  My  examples  will  all  be  taken  from  medicine  although  I 
hope  that  a  lot  of  the  things  I  talk  about  are  applicable  much  more 
generally.  In  particular,  I  am  interested  in  "Classification*  types  of 
diagnostic  expert  systems,  not  critiquing  systems  designed  to  comment  on 
a  proposed  course  of  action.  They  are  generally  there  in  order  to  make 
some  sort  of  Judgments  about  some  unknown  aspect  of  some  individual 
person,  which  night  be  a  diagnosis  or  a  prognosis. 

The  basic  structure  consists  of  a  knowledge  base,  kept  separate 
from  an  Inference  engine,  which  controls  the  process  by  which  the 
knowledge  is  used  in  order  to  make  some  sort  of  Judgment  on  a  new 
individual  from  whom  data  is  obtained.  These  are  Just  sort  of  buzz 
words  which  can  mean  all  sorts  of  things  in  different  applications. 

Nhat  is  often  considered  a  necessary  characteristic  of  an  expert 
system  is  that  they  should  be  able  to  Justify  their  reasoning  by  making 
the  process  by  which  they  obtain  their  Judgments  explicit, 
inter pretable,  and  understandable  to  the  user.  They  must  be  able  to 
Justify  their  conclusions. 

To  some  degree  the  knowledge  base  will  be  based  on  expert 
opinion.  Whether  that  is  quantitative  expert  opinion,  or  only 
qualitative  expert  opinion  in  relation  to  the  structure  of  the  knowledge 
base,  will  depend,  again,  on  application  to  application. 

People  in  AI  say  that  you  should  be  able  to  update  a  knowledge 
base  from  experience,  especially  from  its  failures  as  well  as  from  its 
successes,  and  we  should  be  able  to  learn  in  some  way.  And  the  control 
of  the  consultation  (consultation  perhaps  should  be  in  quotes)  will  be 
largely  based  on  some  heuristic  techniques  which  attempt  to  bear  some 
resemblance  to  how  an  expert  may  attempt  to  solve  the  problem.  So  these 
are  some  very  general  phrases  that  say  what  might  be  the  basic  alms  that 


( 


181  - 


w«  arc  trying  to  fuirill. 

Now,  I  going  to  talk  oonplataly  about  applications,  about  past 
applications  and  about  work  In  which  I  aa  Involtrsd.  Thsss  applications 
do  cast  light  on  sobs  of  ths  thsoratlcal  Issues  that  were  being 
discussed  yesterday. 

Before  getting  to  uncertainty,  I  would  like  to  talk  about 
knowledge  representation.  Here  Is  one  example  of  representation  of 
■edlcal  knowledge  with  an  expert  systea,  an  old  one,  the  CASNET  system 
developed  at  Rutgers.  In  this  ease  an  explicit  attempt  is  made  to 
represent  the  physiological  causal  knowledge  that  cllnl.-ians  have  about 
the  disease  glaucoma,  (see  slide  t) 

In  particular,  we  can  see  that  there  Is  a  plane  of  observations. 
These  are  the  actual  variables,  the  data  that  can  be  observed  from  the 
Individual.  And  these  are  considered  as  being  caused  by  some 
unobservable  patho^physlologlcal  states  which  In  turn  are  caused  by  the 
deepest  level  of  the  underlying  unobservable  disease  states.  It  Is 
these  that  we  are  really  Interested  In,  but  we  can  only  observe  the 
diseases  through  this  Intermediate  layer. 

There  are  all  sorts  of  ways  statisticians  might  see  that  as  being 
In  terms  of  latent  class  models.  In  order  to  allow  for  dependence  by 
putting  in  intermediate  states.  A  less  structured  system  Is  something 
like  HYCIN,  where  the  knowledge  consists  of  little  chunks  of  production 
rules  Which  relate  to  particular  groups  of  findings.  There  Is  some 
underlying  structure  but  not  very  much. 

Getting  onto  some  things  that  are  a  bit  more  complicated,  this  Is 
Just  a  part  of  the  structure  of  INTERNIST  or  CAOUCEUS.  This  may  Just 
look  like  a  lot  of  Jlbberlsh.  But  I  find  when  I  put  this  up  In  front  of 
clinicians,  after  a  bit  they  start  seeing  that  this  does  make  sense. 
There  are  certain  aspects  of  this  that  relate  t  3  what  Glenn  Shafer  was 
saying  yesterday,  (see  slide  2) 

First  of  all,  CADUCEUS  is  this  massive  system  with  500  diseases 
and  3000  possible  symptoms  and  is  supposed  to  cover  most  problems  In 
Internal  medicine.  But  looking  at  the  knowledge  structure  that  Is  used,' 
I  think.  Is  important,  because  this  relates  to  many  problems  in 
medicine.  The  diseases,  which  are  the  blocks  here,  the  pathological  and 
nosological  descriptions,  can  be  related  very  often  in  soora  hierarchical 
taxonomic  structure. 

So  we  have  here  this  connector  which  Is  a  subclass! float Ion.  He 
have  hepato'cellular  Involvement,  of  which  a  particular  type  Is  flbrotic 
hepatocellular  Involvement,  of  which  a  particular  type  is  cirrhosis  of 
which  a  particular  type  Is  biliary  clrrhosla.  So  very  often  we  do  get 
this  hierarchical  representation  of  the  underlying  diseases. 


SuperlapoMd  on  that  you  have  a  causal  network  where  In 
particular,  there  ie  a  "caused  by"  link  that  is  obtained  so  you  can  see 
the  upper  gastrointestinal  heaorrhage  can  be  caused  by  the  portal 
hypertension.  So  what  that  is  doing,  having  an  underlying  taxonoaic 
disease  structure  with  a  causal  network  superinposed  which  affects 
different  levels  of  that  structure.  That  is  exactly  the  picture  that 
Glenn  Shafer  put  up  yesterday  where  particular  Itens  of  evidence  nay 
tell  you  about  particular  levels  of  the  disease,  and  that  recurs  nany 
tines. 

So  that  is  an  example  of  the  sort  of  flexibility,  the  structuring 
that  people  try  to  put  into  expert  systens  in  medicine. 

What  about  uncertainty?  Well,  before  becoming  technical  about 
it,  1  will  start  out  in  a  linguistic  way  describing  how  people  night  use 
the  word  "uncertainty”  in  describing  attempts  to  represent  expert 
knowledge.  Here  are  Just  a  few  of  then  and  you  can  Just  keep  on  going 
on  the  list.  One  type  of  uncertainty  has  to  do  with  incompleteness  of 
our  knowledge,  that  in  a  particular  set  of  circumstances  we  have  not 
provided  for  the  inference.  What  do  we  think  if  X  occurs? 

Another  type  of  uncertainty  could  be  doubt  about  actual 
qualitative  structure  of  that  knowledge  base.  Is  it  reasonable  to 
assume  that  two  things  are  independent  or  not?  Another  type  of 
uncertainty,  and  generally  the  type  that  is  very  often  discussed  in 
relation  to  production  rules,  is  sometimes  called  the  degree  of 
implication.  That  is,  X  implies  Y  with  some  attached  measure  or 
uncertainty  factor,  or  whatever. 

Another  type  of  uncertainty  is  the  imprecision  about  this 
quantification.  Another  type  of  uncertainty  is  ignorance,  where  you 
have  no  information  about  whether  a  particular  disease  is  present  or 
not.  This  is  often  used  in  hierarchical  structures,  because  you  may 
have  information  at  one  level  of  hierarchy  and  not  be  able  to  say 
anything  lower  down  the  hierarchy  -  this  example  was  shown  yesterday. 

Of  course,  one  might  even  talk  of  the  probability  in  terms  of, 
say,  betting  odds  on  X  being  true.  There  are  other  ways  in  which  the 
word  uncertainty  is  used.  It  is  often  used  in  terms  of  the  imprecision 
of  a  proposition,  the  degree  of  truth,  or  the  extent  to  which  X  is  true. 
It  is  often  used  in  terms  of  degree  of  support  of  the  evidence  for  an 
underlying  disease.  I  am  not  saying  the  probability  of  the  disease,  but 
Just  some  degree  of  matching  or  some  degree  of  compatibility.  It  is 
often  used  as  well  about  uncertainty  in  terms  of  action,  e.g.  "I  don't 
know  whether  it  is  reasonable  to  assume  X  from  now  on".  This  is  in 
terms  of  control  strategies.  There  may  be  an  uncertainty  about  going 
down  a  particular  path  i.e.,  "Is  this  a  reasonable  thing  to  do  or  not?" 
Cohen's  endorsement  work  seems  to  use  this  interpretation  a  lot. 

1  Just  want  to  show  the  ways  in  which  these  terms  are  used.  I 
have  found  in  going  to  meetings  that  it  is  sometimes  very  difficult  to 
work  out  what  people  are  talking  about  because  of  the  wide  range  of 
descriptions  people  use.  Different  people  from  different  professions 
have  different  ways  of  approaching  the  subject. 


-  183  - 


People  free  the  neturel  language  area  eay  try  to  laitate  how 
people  use  these  phrases.  So  what  1  would  like  to  eephasize  here  is 
that  I  an  using  uncertainty  in  a  very  particular  way.  I  an  talking 
about  probability,  but  argue  that  probability  is  nore  flexible  than  has 
usually  been  considered. 

Let's  look  at  uncertainty  and  how  it  is  handled  in  sone  working 
systens.  Here  is  a  trivial  systen  but  one  that  is  actually  used  in  a 
number  of  British  hospitals.  A  patient  cones  in  with  acute  abdoninal 
pain  and  the  casualty  officer,  in  the  accident  and  eaergency  departaent, 
rings  the  relevant  synptoas  on  the  fora.  The  rings  go  through  on  a 
sheet  of  paper  with  nuabers  on  it.  He  types  the  numbers  into  a  aicro  - 
they  have  been  using  an  Apple  or  Coaaodores.  At  the  bottoa  of  the 
printout  comes  the  probability  that  they  have  appendicitis,  that  they 
have  pancreatitis,  et  cetera.  This  is  a  statistical  systea,  which 
generates  probabilities  and  is  in  use.  There  is  a  big  trial  going  on 
that  has  Just  finished,  where  16,000  people  have  gone  through  this 
systen  to  see  whether  or  not  it  makes  any  difference  at  all  to  their 
health.  He  are  still  trying  to  work  out  whether  it  does. 

I  was  not  responsible  for  the  design  of  the  systen,  1  an  only 
involved  in  the  evaluation.  But  the  way  it  works  is  usually  known  as 
Idiot's  Bayes  in  the  trade,  or  oonditional  independence.  It  is  using 
Bayes'  theorea,  assuming  conditional  independence  within  disease  groups. 
Essentially,  the  knowledge  in  the  systea,  which  is  barely  worth  the 
name,  it  is  Just  a  matrix  of  conditional  probabilities,  saying  for  each 
disease  what  proportion  of  then  have  any  particular  characteristic  that 
will  be  shown,  baaed  on  past  data,  (see  slide  3) 

DR.  COHEN:  Is  this  assuaing  the  diseases  are  autually  exclusive? 

DR.  SPIEGELHALTER:  Yes,  this  is  assuaing  the  diseases  are 
autually  exclusive  and  exhaustive  and  syaptoas  are  conditionally 
independent  given  the  diseases.  About  1000  papers  have  been  published 
using  this  Idiot's  Bayes  aethod,  but  there  has  been  very  little  effect 
on  clinical  practice.  So  soaraone  with  appendicitis,  their  prior 
probability  is  .26  and  these  likelihoods  are  put  in,  for  example  23 
percent  of  appendicitis  patients  have  right  lower  quadrant  pain,  in 
fact.  This  is  Just  three  syaptoas,  but  in  fact  there  could  be  acre 
going  in  here.  They  are  all  aultiplied  up.  You  get  a  total  which  is 
noraalized  down  to  one  and  this  coaes  out  as  the  probability.  So  that 
is  the  slaplest  aodel  that  is  very  frequently  used  in  — 

DR.  SINGPURHALLA:  Hhat  is  the  data  on  the  new  individual? 

DR.  SPIEGELHALTER:  These  are  the  findings  on  the  new  individual. 
If  someone  comes  in  and  they  are  feaale  and  they  are  age  16,  they  have 
got  right  lower  quadrant  pain,  et  cetera,  and  there  are  lots  nore 
findings  than  this. 


-  184  - 


OR.  SINCPURMALLA:  Are  those  nuabers.  .A9  end  so  on,  the 
proportion  of  feaales? 

DR.  SPIBGELHALTER:  No,  the  proportion  of  people  with 
appendicitis  who  are  feaale.  So  the  aodel  says  you  Multiply  together 
the  likelihoods  and  the  prior,  noraalizlng  to  one  and  that  is  the 
approach.  It  is  very  crude,  over-slaplistic  statistical  aodeling  but 
one  that  is  very  often  done. 

And  the  explanation  will  give  you  the  syaptoas  that  you  typed  in, 
and  the  probabilities  attached  to  these  diseases.  Now,  one  Bight  see 
that  the  syaptoas  are  not  particularly  independent,  given  the  disease. 
For  exaaple,  the  questions  "did  they  have  previous  abdoainal  surgery" 
and  "have  they  got  an  abdoainal  scar"  are  both  there  and  you  sort  of 
think  they  aight  be  slightly  related,  even  within  disease  classes.  So 
the  effect  is  that  you  aight  stick  probabilities  in  and  you  aay  process 
then  using  soae  sort  of  statistical  aodel,  but  by  the  tiae  they  coae 
out,  the  nuabers  don't  reseable  probabilities  in  any  sense  that  you 
would  want  to  bet  on  then,  and  in  general  because  of  double  counting  of 
evidence  the  probabilities  are  too  extreae.  You  have  got  a  lot  of  99 
percent  chances  of  appendicitis,  but  less  than  99f  are  correct. 

So  you  can't  really  trust  these  nuabers.  They  provide  a  ranking, 
soae  Measure  of  evidence  for  the  disease,  but  they  are  not  really 
probabilities  by  the  tiae  you  are  done. 

The  knowledge  base  of  that  acute  abdoainal  pain  systea,  is  siaply 
a  disease  and  a  lot  of  conditionally  independent  syaptoas  with  no 
additional  structure.  PROSPECTOR  will  provide  a  deeper  structure  in 
which  you  have  a  aeries  of  iaplicatlons  between  nodes,  drawn  up  as  an 
inference  network,  but  it  essentially  tries  to  use  a  siailar  calculus 
where,  notice  in  that  previous  systea  the  actual  updating  aechanisa  was 
Multiplying  by  likelihood  ratios.  In  PROSPECTOR,  siailarly  the  nuabers 
attach  to  each  link  between  one  finding  f  and  a  conclusion  c  is  a 
likelihood  ratio,  so  if  f  is  true  you  Multiply  the  odds  on  c  by  two  and 
if  f  is  false  you  Multiply  the  odds  on  c  by  another  factor.  If  the 
finding  has  a  probability  of  being  true  you  take  soae  sort  of  weighted 
average. 

So  it  is  a  sort  of  vaguely  Bayesian  systea,  but  also  in  this  case 
by  the  tiae  you  have  gone  through  the  network  the  nuabers  don't  really 
reseable  anything  you  would  want  to  call  probabilities. 

It  was  Mentioned  yesterday,  the  HTCIN  people  were  very  interested 
in  looking  at  belief  functions  and  here  is  an  exaaple  that  they  have 
talked  about.  They  would  like  to  use  belief  functions  within  MYCIN,  as 
a  response  to  the  hierarchal  nature  of  the  disease  hypotheses  and  the 
fact  that  certain  pieces  of  evidence  hit  different  levels  of  the 
hierarchy. 


So  hore  you  have  got  one  particular  group  of  diaeases.  These 
break  down  in  a  hierarchy,  but  you  Might  have  one  piece  of  evidence 
that  gives  a  weight  Just  to  this  pair  of  diseases,  say  Intrahepatic 
Jaundice  and  another  piece  of  evidence  that  suggests  extra-hepatic.  The 
rest  of  it  we  leave  unassigned.  This  will  give  a  probability  aass  over 
this  hierarchy  and  that  leads  to  a  range  of  belief,  belief  in  the 
possibility  of  any  particular  hypothesis  whi^  is  Just  the  sua  of  the 
■asses  on  eleaents  below  it  and  the  upper  point  is  one  ainus  the  sua 
over  the  eleaents  that  don't  contain  the  subset  of  interest,  (see  slide 
*1) 

DR.  SINCPURWALLA :  I  aa  sorry  for  this'  point  of  clarification.  1 
don't  know  what  all  of  this  aeans.  Which  is  the  disease,  the  thing  at 
the  top?  Which  is  the  figure  you  are  interested  in.  the  bottoa  or  the 
top? 

DR.  SPIEGELHALTER:  These  are  the  separate  eleaents  of  the 
disease.  You  can  assuae  it  is  one  of  these.  You  don't  know  which,  but 
they  decompose  naturally  into  a  hierarchy  of  subsets. 

DR.  SINCPURWALLA:  So  the  patient  coaes  in  coaplaining  about 
gallstones. 

DR.  SPIEGELHALTER:  No.  the  patient  coaes  in  coaplaining  and  you 
have  narrowed  it  down  to  chotestatic  Jaundice.  So  you  assuae  it  is 
within  here  somewhere. 

DR.  SINCPURWALLA:  That  is  made  up  of  four  parts? 

DR.  SPIEGELHALTER:  Yes.  four  possible  diseases  that  are  labeled 
chotestatic  Jaundice,  four  possible  components. 

DR.  SINCPURWALLA:  But  I  an  thinking  the  patient  would  come  in 
complaining  about  gallstones  or  something  like  that. 

DR.  SPIECELHALTER:  No.  these  are  the  unobservable  diseases  with 
a  causal  network  where  pieces  of  evidence  affect  different  levels  of 
this  hierarchy. 

DR.  SINCPURWALLA:  And  your  goal  is  to  catch  the  top? 

DR.  SPIECELHALTER:  No.  your  goal  really  will  be  to  identify  one 
of  these  particular  diseases  at  the  bottom,  but  your  evidence  nay  only 
in  fact  tell  you  that  there  might  be  evidence  that  supports  the 
hypothesis  that  it  is  either  hepatitis  cirrhosis,  but  there  is  no 
evidence  there  to  tell  you  what  the  individual  disease  night  be.  and 
this  is  a  very  common  structure. 

So  What  have  people  been  trying  to  do  with  uncertainty  in  expert 
systems?  What  are  the  practical  objectives  for  the  calculus  of 
uncertainty?  There  are  various  things  that  cone  out  of  all  of  this 
discussion.  First,  a  single  number  applied  to  a  hypothesis  is  generally 
considered  insufficient  and  I  would  agree  with  that.  Just  a  system  that 
goes  crunch,  crunch  and  bangs  out  "probability  is  .73'  is  not  considered 


-  186  - 


acceptable  and  1  don't  blaae  anybody  for  not  considering  that 
acceptable. 

So  what  do  people  want  to  be  able  to  do?  There  are  a  lot  of 
claias  that  they  want  to  be  able  to  cite  the  sources  of  contributing 
evidence.  In  MYCIN,  there  say  be  a  trace  of  the  rules  of  being  fired, 
showing  how  that  conclusion  was  reached.  In  particular  in  INTERNIST, 
you  can  identify  what  evidence  supports  the  hypothesis  and  what  evidence 
is  against  the  hypothesis  so  one  can  see  where  ^ere  is  conflict. 

They  want  to  be  able  to  cope  with  hierarchical  hypothesis 
structures.  They  would  like  to  propagate  through  networks  when  you  have 
got  data  coning  in  in  a  very  sporadic  fashion,  bits  of  infornation 
coning  in  all  over  the  place.  They  night  like  to  nake  inprecise 
specification  of  the  quantities  in  the  system  because  people  aren't  too 
happy  about  giving  single  numbers. 

There  is  also  an  idea  that  the  point  value  for  an  uncertainty  nay 
be  considered  unacceptable,  and  a  number  of  reasons  why  one  night 
prefer  a  range  or  sone  sort  of  curve  over  that  value  have  been 
suggested. 

The  first  is  what  I  would  like  to  call  due  to  "ignorance."  This 
is  related  to  what  Horris  DeGroot  said  yesterday  about  the  .2  for 
Shakespeare  writing  Hamlet  where,  one  would  like  to  know  the  sensitivity 
of  that  .2  to  the  coning  in  of  new  infornation.  One  essentially  says  if 
that  .2  is  Just  off  the  top  of  your  head,  based  on  considerable 
ignorance,  then  one  night  feel  that  somehow  that  .2  is  only  the  center 
of  a  number  of  possible  probabilities  could  be  attached  to  that 
proposition.  I  would  like  to  get  into  that  idea  a  lot  nore  later. 

This  range  is  due  to  limited  evidence  on  a  new  case,  once  in 
general  as  the  consultation  proceeds,  this  sort  of  "ignorance  range” 
would  decrease  until  when  you  know  everything  one  perhaps  could  feel 
very  happy  about  giving  a  p( Int  probability.  There  are  also 
Inprecislons  in  the  measures  of  uncertainty  but  one  can  think  about  this 
as  the  limitations  in  the  knowledge  base,  rather  than  the  limited 
evidence  on  the  new  case.  This  range  will  tend  to  widen  as  the 
consultation  proceeds  because  there  will  be  nore  and  nore  that  inprecise 
nunbers  are  being  used.  This  is  very  much  an  idea  of  fuzziness,  as  was 
pointed  out  yesterday. 

Finally,  before  I  get  onto  the  little  slide  show,  I  would  like  to 
talk  about  sonething  that  was  not  really  mentioned  much  yesterday,  the 
requirement  for  operational  meaning  of  the  quantifications.  This  is 
sonething  that  is  not  generally  given  much  credence  within  AI.  I  feel 
it  is  vital  that  there  is  operational  meaning  on  three  levels  of  the 
working  of  the  system. 

First  of  all,  the  Inputs  for  the  systen:  if  there  is  going  to  be 
quantified  uncertainty,  why  should  these  inputs  actually  nean  sonething? 
First  of  all,  it  provides  a  nechanisn  for  agreenent  among  different 
people  who  want  to  contribute  nunbers  to  that  systen,  if  these  nunbers 
actually  nean  sonething.  They  can  argue  about  then  and  they  will  argue 


about  thM,  but  at  least  they  know  what  they  are  arguing  about.  It  also 
provides  a  means  of  learning  and  updating  the  system.  It  was  asked 
yesterday  how  do  you  update  fuzzy  numbers  as  more  information  comes  in. 
With  an  external  objective  meaning  or  interpretation  then  they  can  be 
updated. 

The  second  reason  for  wanting  operational  meaning  is  the  internal 
coherence*  the  manipulations  within  the  system  as  data  comes  in.  It 
should  be  subject  to  scr4itiny.  It  should  be  explainable  to  people  and 
be  Justifiable. 

The  third  reason,  which  again  was  not  mentioned  much  yesterday, 
is  about  whether  the  outputs  of  the  system  should  have  external  validity 
and  particularly  in  terms  of  probabilities  the  idea  of  calibration  comes 
in  very  strongly.  If  the  system  is  going  to  be  able  to  sway  people  to 
believe  what  it  says,  then  the  output  should  have  operational  meaning. 

So  at  this  point,  I  would  like  to  go  to  a  description  of  a  system 
that  I  have  been  working  on.  I  am  going  to  give  three  examples  this 
morning  of  systems.  The  first  will  be  in  some  detail  and  the  other  two 
will  be  very  brief.  This  is  a  system  I  have  been  working  on  for  some 
tine  developed  Jointly  with  a  gastroenterological  unit  in  Glasgow  .  The 
aim  of  this  study,  first  of  all,  was  to  define  symptoms  and  diseases 
carefully  and  collect  data,  and  work  out  discriminating  systems  to  be 
able  to  give  some  probabilistic  diagnosis  for  new  patients. 

As  I  will  show  in  a  minute,  one  of  the  main  aspects  of  the  study 
is  collecting  data  from  the  patients  by  direct  computer  interviewing. 

The  aim  is  to  identify  particularly  high  risk  or  low  risk  groups  in 
order  to  avoid  unnecessary  investigations,  to  save  money  in  the  health 
service.  So  the  system  is  designed  to  interview  the  patient  by 
computer,  make  some  sort  of  probabilistic  diagnosis.  That  is  the  bit  I 
want  to  talk  about  today,  then  to  make  some  tenuous  recommendations 
about  management  and  to  make  some  sort  of  report  to  the  clinician. 

But  it  is  the  second  aspect  I  would  like  to  concentrate  on.  The 
idea  is  that  it  can  be  used  on  that  patient  in  the  clinic  by  Junior 
doctors,  in  health  centers  by  general  practitioners  and  in  reRK>te  areas 
by  paramedical  staff.  At  the  moment  it  is  being  used  in  two  outpatient 
clinics  and  a  health  center  on  a  very  experimental  basis. 

So  the  aim  is  when  the  doctor  sees  the  patient,  the  patient 
should  have  first  of  all  been  interviewed  by  the  computer.  The  report 
of  the  interview  does  not  go  to  the  patient,  but  the  doctor  then  can  get 
details  of  the  symptoms,  an  indication  of  important  findings,  some  sort 
of  probabilistic  diagnosis,  some  suggestions  on  future  management,  and 
the  aim  is  that  that  should  save  time  in  both  his  interview  with  the 
patient  and  be  able  to  concentrate  on  important  features  and  in  further 
investigations  be  able  to  save  money. 


It  was  considered  in  this  that  It  was  very  inportant  that 
explanation  facilities  were  available  and  we  did  not  Just  give  out  blank 
probabilities  of  diseases.  One  of  the  things  that  characterize  this  as 
a  statistical  system  was  that  it  was  based  on  data  analysis  rather  than 
subjective  probabilities.  There  was  subjective  knowledge  that  goes  into 
the  structure  of  the  system  but  not  into  the  quantifications.  So  the 
poor  doctors  had  to  do  1200  interviews  of  patients  —  this  id  Just  the 
first  sheet  of  a  ten-page  form  and  one  can  see  why  this  is  not  done  very 
often.  But  it  gives  an  enormous  amount  of  information  in  order  to  make 
some  sort  of  developmental  diagnostic  system. 

But  now  this  is  one  of  the  original  interviewing  systems  based  on 
a  terminal  to  a  DEC  1 1 .  with  a  patient  sitting  in  a  booth  actually 
typing  away  on  a  special  keyboard.  The  sort  of  questions  that  come  up 
on  the  screen  are  written  in  pretty  colloquial  language,  written  by  a 
psychologist  working  with  the  team.  So  "does  the  pain  wake  you  up  at 
night?"  "Does  this  happen  often?"  "Tes."  "When  it  wakes  you  up,  do 
you  have  a  little  drink?"  "Does  it  relieve  the  pain?"  And  a  yes  to 
that  is  indicative  of  a  peptic  ulcer.  Lots  of  people  wake  up  at  night 
with  pain  but  getting  relief  from  a  glass  of  milk  or  a  snack  is 
indicative  of  a  peptic  ulcer. 

I  don't  think  this  is  a  good  example  chosen  there  from  the 
interview,  because  I  think  that  last  question  is  slightly  ambiguous, 
"does  this  happen  often?"  It  concerns  when  you  wake  up  at  night  do  you 
often  get  relief  and  I  don't  think  it  is  quite  clear  from  that  question. 
But  the  interview  is  slightly  intelligent.  It  asks  about  the  main 
symptoms  the  patient  is  complaining  about  first  and  it  Just  branches 
depending  on  the  answer.  But  it  is  not  particularly  intelligent.  It 
takes  about  25  minutes  to  go  through. 

It  is  now  being  put  on  the  Apple  with  a  special  keyboard.  The 
printer  is  there  by  the  Apple  for  demonstration  purposes.  But  it  is 
very  easy  to  use.  There  is  a  close  up  of  the  keyboard.  There  is  a 
"don't  understand"  button  which  generates  more  explanation. 

There  are  six  buttons  which  qualify  the  degree  of  certainty  the 
patient  has  about  the  finding.  These  are  a  complete  sham.  Any  pressing 
on  four,  five  or  six  is  a  yes  response  and  any  pressing  on  one,  two  or 
three  is  a  no  response.  Those  buttons  were  put  there  because  oi  people 
saying  they  want  to  qualify  their  answers.  I  would  not  mind  some 
suggestions  on  how  those  should  be  incorporated  into  the  analysis.  At 
the  moment  they're  ignored. 

This  is  in  Swedish,  and  the  interview  is  being  translated  into 
Dutch  and  Swedish.  The  Swedes  have  written  a  program  so  at  the  end  of 
the  Interview  it  can  generate  a  complete  letter  to  the  general 
practitioner  about  the  patient. 

So  that  in  itself  is  a  valuable  thing,  the  taking  of  the 
interview.  It  has  been  shown  that  people  are  more  honest  to  computers 
about  how  much  they  drink.  People  like  the  computer.  Now  this 
generates  a  vast  amount  of  data.  The  aim  is  to  produce  a  simple, 
accurate  device  relating  the  symptoms  from  the  interview  to  the  possible 


diagnosis 


On#  problsB  is  that  paopla  hava  got  aora  than  one  disease  very 
often  and  also  there  are  a  large  nuaber  of  questions  that  are  asked. 

The  technique  I  have  adopted  in  the  analysis  is  directly  stolen  froa 
aany  AI  applications,  which  divides  up  as  a  "Binary  task  foraulation;* 
to  go  through  each  disease  in  turn  and  say  what  is  the  probability  "they 
have  got  it"  against  "they  have  not  got  it."  So  every  disease  is 
considered  as  a  separate  task.  This  is  probably  not  bptiaal,  but  it 
aakes  it  very  siaple  to  use  and  explain  to  people. 

The  first  thing  is  to  collect  single  discriaination  variables  and 
this  is  essentially  exactly  the  approach  of  the  abdoainal  systea  for 
abdoainal  pain  and  the  PROSPECTOR  systea  where  you  Just  look  at 
likelihood  ratios.  Your  initial  odds  on  a  peptic  ulcer  would  be  19k  to 
358.  Then  they  say  they  often  wake  at  night  and  get  relief  froa  their 
pain  by  a  drink  or  a  snack  and  we  can  see  that  turns  the  posterior  odds. 
Just  considering  that  single  piece  of  Inforaation,  into  81  to  k2.  Thus 
you  aultiply  the  prior  odds  to  the  posterior  odds  by  this' factor  in 
between-the  likelihood  ratio-take  logs  to  turn  it  into  a  suaaation, 
aultiply  by  100  to  turn  it  into  a  whole  nuaber  and  you  end  up  with  what 
is  known  as  the  weight  of  evidence,  which  is  a  tern  used  by  Jack  Good, 
which  is  Just  the  log  likelihood  ratio. 

So  what  this  does  is  turn  an  Idiot's  Bayes  systea  into  a  scoring 

systea. 

The  next  thing  is  to  say  that  Idiot's  Bayes  is  cruaay,  because  it 
assuaes  all  pieces  of  inforaation  are  independent  within  the  disease  and 
the  not  disease  class,  so  we  want  to  allow  soae  dependence,  so  you  throw 
this  into  a  logistic  regression  package  and  that  will  tend  to  squeexe 
down  the  scores,  what  I  call  crude  scores,  to  adjust  then  to  allow  for 
the  dependence  between  thea. 

The  ala  is  to  produce  a  scoring  systea  where  the  outputs  actually 
are  calibrated  nuabers.  1  will  cone  back  to  that. 

DR.  LINDLEY:  I  don't  understand  that. 

DR.  SPIECELHALTER:  What  I  have  done  is  put  the  crude  scores, 
which  are  these  crude  weights  of  evidence,  and  taken  those  as  the  data, 
put  then  into  a  regression  package. 

DR.  SHAFER:  What  is  the  dependent  variable? 

DR.  SPIECELHALTER:  The  dependent  variable  in  the  logistic 
regression  is  the  log  odds  on  peptic  ulcer  being  present. 

DR.  REEVES:  AT  no  point  have  you  aentioned  the  physical 
evidence,  like  blcod  tests,  urinalysis. 


DR.  SPIBGELHALTERt  1  aa  sorry*  I  should  hsvs  sxplslnsd  that.  I 
hsvs  not  put  in  ths  blood  tost.  Thsy  hsvs  boon  dons  on  svsrybody*  but 
this  is  designsd  to  gat  ths  ■sxisua  inforastion  froa  syaptoaatology 
slons. 

And  than  you  can  turn  ths  final  soors*  which  you  gat  by  adding  up 
all  of  thasa  nuabars  into  a  probability.  Tha  point  baing  about  all  this 
is  that  you  and  up  with  what  is  a  siapla  systaa  to  axplain  and  to  usa. 

It  aakas  tha  eoaputar  prograa  actually  trivial.  As  tha  inforastion  is 
typed  in  by  tha  patient  in  raaponsa  to  questions  on  the  scraan  tha 
nuabars  are  added  up  inside  tha  aachina  as  evidanca  towards  tha  disease 
peptic  ulcer,  (sea  slide  5) 

What  this  allows  is  explanation  facilities  for  the  type  shown 
hare.  At  tha  and  of  an  interview  one  can  state  what  is  tha  evidanca 
against  a  particular  hypothesis  and  what  is  tha  evidence  for  a 
particular  hypothesis.  This  is  trying  to  introduce  ideas  froa  AI  and 
put  than  into  a  statistical,  probabilistic  systaa.  So  one  can  say  to 
tha  doctor  idiat  are  tha  iaportant  pieces  of  inforastion  and  that  could 
perhaps  be  aore  carefully  checked  on  the  patient. 

So  one  works  out  the  evidence  for,  evidence  against  and  one  can 
calculate  the  balance  of  evidence  and  in  this  case  it  is  quite  strongly 
in  favor  of  peptic  ulcer.  There  is  an  initial  score  which  reflects  the 
prior  probability  of  peptic  ulcer  in  that  particular  group  of  patients 
and  that  gives  you  the  final  score.  And  that  final  score  can  be 
translated  into  the  probability  of  a  peptic  ulcer,  (see  slide  6) 

There  is  light  of  conflict  which  we  can  introduce,  which  we  are 
defining  at  the  aoaent,  and  I  aa  not  sure  about  this,  as  being  the  ratio 
of  the  total  evidence,  that  118  plus  278,  to  the  balance  of  evidence 
which  is  160.  So  there  is  a  conflict  of  2.5.  The  conflict  ratio  would 
be  large  if  you  have  got  lots  of  evidence  pointing  in  each  direction. 

It  will  go  down  to  one  if  all  of  the  evidence  is  in  one  direction. 

John  Tukey  has  suggested  the  output  for  this  prograa  could  be 
displayed  graphically,  by  showing  how  the  scores  change  starting  at  the 
bottoa  with  the  prevalent  score  (on  that  graph  it  is  shown  as  a 
probability)  and  then  the  evidence  against  the  hypothesis  of  peptic 
ulcer  shows  it  shifting  to  the  left  and  the  evidence  for  it  shows  it 
shifting  to  the  right.  There  is  a  graphical  representation  of  the 
contributing  aspects  of  evidence  going  up  to  the  final  probability  on 
peptic  ulcer,  (see  slide  7) 

Here  is  an  exaaple  showing  conflict  in  action  —  actually  not 
showing  conflict.  This  is  for  alcohol  induced  dyspepsia,  which  is 
pretty  coaaon  in  Glasgow.  These  are  the  iaportant  questions  about 
nausea  before  breakfast,  retching,  and  the  point  is  that  the  systea,  by 
looking  at  conflict  of  evidence  can  identify  soaeone  who  has  all  of  the 
ayaptoas  of  alooholisa  arl  yet  refuses  to  adait  he  is  drinking,  (aee 
alide  8) 


ir  that  patiant  had  aaid  "aleohol  intaka?  No*  Doetw,  I  navar 
touch  a  drop''«that  would  ooaa  up  aa  a  vary  larga  oonfliet  ratio.  Lota 
of  avidanea  is  for  and  lota  ia  againat  it.  But  that  should  ring  balls 
and  lat  tha  doctor  know  that  perhaps  this  patiant  should  ba  quastionad 
■ora  oarafully. 

This  is  an  axaapla  in  tha  paper  avidanca  for  and  against 
gallstones.  It  is  a  sort  of  an  account  sheet  of  avidanca  looking  at 
conflict,  and  Justifying  tha  final  chance.  You  notice  I  say  chance 
there.  There  is  a  good  reason  why  I  sa  using  tha  word  chance,  because 
of  what  wa  actually  aaan. 

What  I  would  Ilka  to  talk  a  bit  about  now  is  tha  idea  I  ■entionad 
earlier  about  ignorance.  How  does  one  Incorporate  ignorance  into  a 
probabilistic  systaa?  Ignorance,  I  view  as  Meaning  it  is  vary  feasible 
that  the  probability  that  you  have  at  tha  noaent  oould  change  very 
draaatlcally,  because  it  is  based  on  very  little  inforaation.  One  way 
that  one  can  look  at  this  is  to  say  what  are  the  possible  probabilities 
that  can  be  obtained  by  a  patient  at  a  particular  point  in  the 
interview. 

So  before  an  interview  starts  there  is  the  distribution  on  the 
possible  scores  that  can  be  obtained  for  diagnosis  of  gallstones.  Host 
people  are  going  to  get  very  low  scores;  soae  will  get  fairly  high 
scores  and  the  ones  shaded  are  the  people  actually  with  gallstones.  So 
what  we  consider  when  we  start  an  interview,  the  probability  of 
gallstones  is  only  about  five  percent.  But  that  could  change 
draaatlcally  as  inforaation  coaes  into  the  systea,  so  we  would  say  that 
5%  probability  was  based  on  considerable  ignorance. 

Why  I  called  those  probabilities  chances  is  that  because  of  the 
way  in  which  this  has  been  designed,  the  data  analysis  that  has  gone 
into  the  systea,  these  probabilities  are  calibrated.  When  the  systea 
says  61  percent  chance  of  gallstones,  then  round  about  60  percent  of 
the  tiae  it  is  going  to  be  right.  This  shows  a  rough  calibration  curve 
where  you  plot  along  the  bottoa  the  probability  given  to  a  disease  by 
the  coaputer  systea  and  along  the  side  the  actual  proportion  of  tines 
the  disease  actually  turns  out  to  be  present. 

If  the  probabilities  aean  soaething,  if  they  can  be  calibrated 
then  that  line  is  about  on  the  diagonal.  The  solid  line  are  the  doctors 
and  that  is  a  typical  pattern  of  gross  over  confidence.  They  say  "I'a 
99  percent  sure  it  is  a  peptic  ulcer"  and  they  are  only  right  about  80 
percent  of  the  tiae.  They  have  gotten  better  now  with  training.  So  the 
point  is  you  can  go  through  this  and  one  gets  probabilities.  They  can 
add  up  to  aore  than  one,  because  you  have  got  Multiple  diagnosis.  They 
could  add  up  to  a  lot  less  than  one. 

I  don't  really  want  to  eaphasise  actually  aaking  recoaaendations. 
Essentially  for  low  probability,  we  wmild  say  you  can  ignore  the 
disease.  For  very  high  probability,  you  should  investigate  it.  You 
should  perhaps  treat  straightaway  and  in  between  you  should  recoaaend 
investigations.  The  point  being  is  to  cut  down  the  nuaber  of 
unnecessary  negative  investigations. 


Th«r«  arc  aoac  particular  aapccta  that  I  vould  like  to  Just 
caphaaizc  again.  The  firat  thing  ia  we  have  actually  coped  with 
hierarchical  diaeaaes,  although  that  did  not  com  out  in  the 
presentation.  Dyspepsia  is  a  hierarchical  disease  structure  in  which, 
for  exaaple,  the  disease  peptic  ulcer  breaks  down  into  duodenal  or 
gastric  ulcer  and  Mny  syaptoas  coae  in  at  the  level  of  discriainating 
peptic  ulcer  froa  people  who  have  not  got  a  peptic  ulcer.  A  few 
syaptcas  coae  in  for  discriainating  duodenal  froa  gastric  ulcer. 

How  we  deal  with  that  is  to  treat  those  as  two  separate  probleas, 
essentially  in  the  hierarchical  taxonoay  to  do  one  division  between 
peptic  ulcer  and  non-peptic  ulcer  and  get  a  probability  of  peptic  ulcer 
and  then  get  a  probability  of  duodenal  ulcer,  given  it  is  within  the 
peptic  ulcer  class.  That  coms  out  from  a  separate  scoring  systea. 

These  can  be  coabined  essentially  by  aultlplying  the  probability  down 
the  tree  to  give  an  overall  probability  of  duodenal  ulcer. 

So  within  this  fairly  aiaple  taxonoaic  structure,  we  can  handle 
it  using  probabilities.  Other  aspects  concern  the  idea  of  probability 
ranges.  Before  the  interview  starts,  the  probability  of  gallstones  is 
about  five  percent  and  it  is  a  fairly  tight  standard  error  around  that 
value.  That  is  an  iapreclsion  coaing  froa  a  knowledge  base.  So  we 
start  off  with  a  fairly  precise  probability,  but  one  that  is  alaost 
totally  vacuous  for  decision  Mking.  You  can  say  probability  of  five 
percent  but  it  is  based  on  alaost  no  evidence  whatsoever.  That  is 
reflected  by  the  fact  that  you  know  at  the  end  of  the  Interview  the 
probabilities  can  range  anywhere  along  this  distribution. 

So  your  predicted  distribution  of  the  final  probability  that 
could  be  taken  on  is  very  wide. 

DR.  WISE:  Is  this  taken  to  be  a  data  distribution? 

DR.  SPIECELHALTER:  Yes,  this  is  Just  an  eaplrlcal  distribution 
of  the  final  probabilities  that  it  could  take  on.  After  the  interview 
is  finished,  say,  in  that  patient  that  we  talked  about  earlier,  you  get 
a  probability  of  gallstones  of  6l  percent  and  that  has  actually  got 
quite  a  wide  standard  error  around  it.  This  can  be  calculated  if 
necessary.  So  the  probability  95  percent  interval  is  AO  percent  to  78 
percent,  so  it  is  a  big,  fairly  iaprecise  nuaber,  because  you  have  put 
in  a  lot  of  these  scores  with  error  attached  to  then.  So  you  end  up 
with  iapreclsion  in  the  nuaber  aultlplying  up  as  the  consultation 
proceeds. 

DR.  SINGPURWALLA :  You  assuaed  Independence. 

DR.  SPIECELHALTER:  Not  quite,  because  of  the  regression 
analysis.  So  it  is  a  fairly  iaprecise  nuaber  but  one  based  on 
considerable  evidence,  coapared  with  the  initial  nuaber  which  is  precise 
but  pretty  vacuous  and  Ignorant.  And  I  will  com  back  to  that  in  a 
little  bit,  but  I  better  Just  aove  on. 


I  would  Ilk*  to  talk  vary  briofly  now  about  anothar  ayataa  aoM 
eollaafuaa  hava  baan  working  on.  IMMEDIATE  (Intalllgant  Modular  Madieal 
Inforaatlon  for  Aaaaaaaanti  Traataant  and  Education).  This  is  a  raal 
Prolog  baaad  axpart  systaB.  rula  baaad.  It  looks  a  bit  lika  PROSPECTOR 
and  it  is  dasignad  for  ganaral  praetiea.  This  is  a  particular  aodula  of 
it  dealing  with  gynaeologieal  problaas. 

1  Just  want  to  talk  about  ona  particular  aspaet  which  relates 
again  back  to  the  idea  of  ignorance  and  poasibla  probabilities  that -can 
be  taken  on  in  the  future.  The  iaportant  thing  about  a  coaputar  systaa 
that  is  going  to  be  used  by  ganaral  practice  is  that  it  should  be  vary 
unobtrusive  and  should  only  ask  questions  whan  it  is  convinced  the 
question  should  be  asked.  It  uses  this  idea  of  inportanca  driven 
control,  where  new  inforaatlon  coaes  into  the  systaa  and  uncertainty  is 
propagated  through  the  nodes,  using  what  we  are  atteapting  to  have  as  a 
coherent  calculus,  but  we  are  struggling  a  bit  on  that  one,  and  then 
backwards  coae  soae  idea  of  inportanca  of  questions  that  have  not  been 
yet  asked.  This  suggests  an  ordering  of  questions  that  can  be  asked  by 
the  general  practitioner  that  coae  up  on  the  screen  indicating  their 
iaportance  to  be  asked. 

So  the  idea  of  iaportance  is  very  iaportant.  One  can  think  of  it 
as  sort  of  an  ad  hoc  way  of  trying  to  do  a  decision  analysis,  to  try  to 
ask  what  are  the  iaportant  questions  to  ask  next.  ' 

DR.  WISE:  You  say  you  are  planning  on  revising  this  ordering? 

DR.  SPIEGBLHALTER:  Yes.  I  can't  explain  the  coaputation 
techniques,  but  it  is  pretty  heavy  stuff,  because  you  have  to  do  a 
coaplete  search  forwards,  the  whole  tine.  What  goes  into  deciding  on 
whether  a  question  should  be  asked  or  not  has  to  do  with  the  current 
certainty  of  the  question. 

And  the  "certainty  Units."  These  are  the  possible  extreae 
values  that  that  probability  could  take  on  when  further  questions  are 
asked.  This  suaaarlxes  our  current  ignorance  about  that  particular 
question.  Ignorance  specifically  related  to  what  we  don’t  know  but  could 
know  within  the  systea.  There  is  also  a  aeasure  of  the  potential 
iaportance  of  the  answer.  And  you  end  up  1  aean  this  is  an  aaazing 
phrase  and  I  an  not  responsible  for  it,  an  "iaportance  actualization 
function,"  which  coablnes  these  three  coaponents  and  in  an  ad  hoc  way, 
is  trying  to  get  over  an  idea  of  the  expected  change  in  utility,  to  give 
an  idea  of  investigative  iaportance. 

DR.  DEMPSTER:  David,  are  all  of  these  things  foraulas? 

DR.  SPIEGELHALTER:  Yes. 

DR.  DEMPSTER:  So  there  is  sons  aatheaatioal  function? 


-V-'. 


■  ***  *** 


-  194  - 


DR.  SPZBGELHALTER:  Yes,  but  it  is  coapletely  sd  hoc. 

OR.  DEMPSTER:  It  Is  not  totally  fuzzy  in  teras  of  words? 

DR.  SPIECELHALTER:  No.  Again,  1  would  like  to  eoae  back  to  this 
idea  of  foraalizing  ignorance,  foraalizlng  what  we  don't  know,  which 
relates  both  to  the  dyspepsia  aystea  and  to  IMMEDIATE.  Let  X  be  what  we 
currently  don't  know  within  the  systea,  the  nodes  that  have  not  been 
established  yet.  so  these  are  knowable  things  but  yet  haven't  been 
asked.  Let  P  be  our  current  belief  in  soae  disease  D.  Suppose  that 
were  we  to  observe  little  x,  we'd  end  up  with  a  final  posterior 
probability  of  the  disease  D.  What  we  should  try  to  do  is  calculate  the 
predictive  distribution  of  the  final  probabilities  that  could  occur,  and 
use  this  for  control  and  explanation  purposes. 

Now,  what  I  aa  saying  is  that  one  has  a  certainty  of  any 
hypothesis  at  any  tiae,  but  the  idea  of  ignorance  is  interpreted  as 
aeaning  that  it  could  change  draaatically  when  new  inforaation  becoaes 
available  and  that  is  actually  f<H*Mlized  by  carrying  over  a  particular 
distribution  on  the  possible  probabilities  that  could  occur  when  acre 
inforaation  coaes  in.  However,  I^MEDIATE  only  looks  at  the  range  of 
these,  but  in  the  dyspepsia  systea  we  are  trying  to  Incorporate  the 
entire  distribution. 

Now,  with  a  trivial  bit  of  suas,  we  actually  find  that  the 
expectation  of  this  distribution  of  final  possible  probabilities  is  in 
fact  the  initial  probability.  So  all  we  can  say  is  our  current  belief 
in  any  hypothesis  can  be  looked  upon  as  the  expectation  of  the  future 
belief  that  we  aight  have  when  we  finally  finish  the  consultation. 

DR.  SINGPURUALLA:  How  did  you  average  out  the  evidence  X  in  so 
doing?  Isn't  X  the  evidence? 

DR.  SPIECELHALTER:  X  is  the  stuff  we  don't  know  yet.  It  is  the 
questions  we  have  not  asked  yet. 

DR.  SINCPURWALLA:  So  you  are  averaging  out? 

NR.  SPIECELHALTER:  He  are  averaging  out  with  respect  to  what  we 
haven't  asked  yet  and  we  end  up  with  Just  what  we  know  at  the  aoaent. 

It  is  a  reasonable  thing.  It  is  Just  a  aartingale. 

So  this  is  the  idea  of  a  distribution  which  reflects  our  current 
ignorance,  which  narrows  as  the  consultation  proceeds,  because  the  final 
posterior  probabilities  are  narrowed  down  further  and  further.  Now,  in 
general,  this  definition  of  Ignorance  in  teras  of  explicitly  what  we 
don't  know  yet  is  not  possible  within  general  statistics,  because  one 
can't  enuaerate  all  of  the  questions  that  have  not  been  asked. 

However,  what  characterizes  an  expert  systea  is  it  is  a  closed 
body  of  knowledge.  The  actual  coaputing  is  heavy,  because  one  has  to 
work  out  all  of  the  tiae  what  has  not  been  asked  yet.  But  one  can 
theoretically  work  out  a  predicted  distribution  over  all  possible 
answers  that  could  occur  when  the  consultation  is  finished,  and  that 


provldM  a  foraal  daflnition  of  our  currant  ignoranca  ooncarnlng  tha 
truth  of  our  hypothaala.  That  la  only  posaibla  within  cloaad  ayataas. 
auch  aa  axpart  ayataaa. 

I  battar  Juat  ba  drawing  to  a  eloaa  now.  I  would  Ilka  to  go  on 
to  talk  about  relation  to  fuzzinaaa.  I  hava  talked  coaplataly  within  a 
probablliatlc  fraaawork  and  I  baliava  there  are  araaa  where 
probabllitlaa  won't  work,  and  I  will  Juat  vary  briefly  talk  about  thoaa 
now. 

Whan  do  probabllitlaa  aaka  aanaa  and  whan  don't  they  sake  aanaa? 
Tha  firat  thing  la  If  you  do  hava  tha  Idea  of  tha  degree  of  truth,  if 
your  propositiona  are  not  criaply  defined,  than  it  aaaaa  quite 
raaaonabla  to  uaa  aoae  type  of  fuzzy  aaeaura.  However.  I  believe  that 
one  can  often  avoid  thia  in  our  conputer  interviewing  ayatee.  We  eight 
have  queationa  that  eight  appear  not  to  be  criaply  defined,  auch  aa  "Do 
you  often  wake  up  at  night?" 

Now.  the  atateeent.  "the  patient  often  wakea  up  at  night."  one 
can  think  of  aa  a  pretty  fuzzy  atateeent.  But  if  one  only  Interpreta 
the  "patient  often  wakea  up  at  night"  in  terea  of  "when  aaked  whether  he 
wakea  up  often  at  night,  the  patient  haa  preaaed  the  button  yea"  and  if 
the  atateeent  "the  patient  often  wakea  up  at  night"  and  the  explanation 
and  all  thinking  about  the  problee  la  alwaya  viewed  in  thoae  terea.  then 
that  la  a  aort  of  cheap  way  to  criapify  the  atateeent.  The  phraae 
ahould  not  even  be  interpreted  aa  being  the  truth  about  the  patient,  but 
ahould  only  be  Interpreted  in  terea  of  the  apeciflc  button  that  haa  been 
puahed  when  the  peraon  haa  been  alttlng  In  front  of  the  expert  ayatee. 

So  that  aeeea  a  way  around  aoae  of  the  doubta  about  fuzzinaaa  and  how 
propoaltlona  can  In  fact  be  criap  if  they  are  given  that  interpretation. 

The  other  way  our  probabllitlaa  eight  not  eake  aenae  la  if  the 
propoaltlona  are  not  verifiable.  Thia  would  aeee  to  be  eoat  appropriate 
within  the  control  of  ayateea.  I  eentioned  yeaterday  in  atatiatlcal 
expert  ayateea  you  eight  have  got  unverifiable  propoaltlona  like 
aaaueptlona  one  would  like  to  eake.  like  there  are  noreal  errora  and 
there  are  linear  relationahipa.  which  are  uaeful  for  control.  They  are 
eaaentially  concluaiona,  Interle  conclualona  one  would  like  to  eake. 

It  aeeea  quite  reaaonable  that  aoae  other  calculua  eight  be  uaed 
in  order  to  Juatify  thoae  aaaueptlona.  rather  than  probabllitlaa.  Alao 
for  control  purpoaea  there  aay  be  an  idea  of  a  degree  of  aupport  for 
conclualona.  a  coapatlblllty  of  one  aet  of  data  and  one  aet  of 
hypotheala. 

I  better  atop  now.  What  I  have  argued  about  la  that  in  dealing 
with  uncertainty,  it  la  not  cut  and  dry.  There  are  eany  llngulatic  waya 
In  which  it  la  uaed  and  there  are  areaa  where  probability  eight  not  be 
the  appropriate  thing  to  uae.  In  particular  for  control  purpoaea  in 
drawing  conclualona. 


E'lvlil’. 


However,  probability  1  feel  la  anoraoualy  aore  flexible  than  the 
way  It  haa  been  portrayed.  I  think  It  can  cope  with  hierarchical 
diaeaae  atructurea.  X  think  It  can  cope  with  conflicting  evidence  and 
explanation  in  the  way  GLADYS  doea  and  I  believe  that  it  can  cope  with 
aoae  degree  of  iapreclalon.  although  I  aa  atill  not  quite  aure  of  the 
beat  way  to  do  thla.  In  particular,  I  believe  It  can  cope  with  the 
concept  of  ignorance  within  an  expert  ayatea  when  you  can  apeclfically 
atate  what  you  don't  know  yet.  You  can  aay  the  potential  probabllitiea 
that  can  be  taken  on  and  that  can  be  uaedaa  a  deacrlptlon  of  one'a 
current  Ignorance. 

But  the  aajor  advantage  in  probabllitiea  la,  I  believe,  that  of 
operational  aeanlng.  The  Inputa  aean  aoaething.  They  can  be  argued 
about.  The  aanlpulationa  aean  aoaething,  although  aa  I  pointed  out 
yeaterday,  trying  to  get  a  network  ayatea  to  propagate  probabllitiea  in 
a  coherent  way  la  a  very  difficult  problea.  And  the  outputa  can  be  aade 
to  aean  aoaething  In  teraa  of  their  calibration  and  their  Interpretation 
for  future  action,  and  I  feel  that  that  la  the  arguaent  that  I  find 
convincing.  And  I  auat  aay  In  diacuaalng  it  with  ay  clinical  colleaguea 
they  alao  find  it  convincing. 

Thank  you  very  auch. 

(Applauae) 


I 

An  example  of  Che  three-level 
descrlpclon  of  disease  abstracted 
from  the  CASNET/Glaucoma  model 


Slide  1 


IS 


CNOUSTATIC 

JAUMICC 


■CLter  rUNCTfOM 


*  Cm«.  ♦  6**.l.  •  Pwi 


(fioitooN  I  SnOiTLiPrc) 


IHTIUMl^ATlC  -6  ^..---^ 

Her  *  Cim 


HCfATITIS  I  Cl»MOSIS 


6au.  *  Pah 


I  GalLSTOUCI  PANCACATtTft 


KLier  inTMvAL  OH  "Iktaamcaatic" 

^  (.6,  .3) 

Slide  4 


SYMFIOH  SCORES  -  PSVTIC  UtCER 

Surtlng  acor*  -S4 


Indicant 

Present 

Absent 

Pain  in  cplsatcriia 

2S 

-50 

Pain  oaaea  in  cpiaodcs 

19 

-57 

Pain  wocia  in  winter 

91 

-9 

Hake  and  relief  often 

100 

25 

-50 

Lcrqth  of  history  >4  yrs 

69 

-3 

-75 

Previous  ulcer  operation 

12S 

-5 

Paaily  history  of  ulcer 

39 

-26 

Ssoker 

41 

-74 

Pointing  sign 

S6 

19 

-18 

Relief  froa  food 

44 

-42 

I;Wl 


N 


Slide  5 


Evid€nc€  FOR  Peptte  Olctr 


£vitfrncr  A  GAINST Ptpnc  Utetr 


Abdominil  pain 

1+9) 

Length  of  history  less  then  1  yeer 

Episodic 

(+19) 

No  previous  operstion  for  ulcer 

Relieved  by  food 

(♦44) 

No  seesonti  e^ect  on  piin 

Occasionally  woken  al  nitht  and 

No  weterbresh 

reUeved  by  snack 

(♦21) 

Epipstric 

(♦28) 

Point  at  she  of  pain  with  nniers 

(♦19) 

Family  history  of  ulcer 

(+39) 

Smoker 

(♦41) 

Vomits,  than  eats  within  3  hours 

(♦54) 

- 

+27S 

Balance  of  evidence 

>160 

(Total  evidence  396:  conflict  ratio  «  2 

Initial  score 

-84 

(corresponding  to  prevalence  of  30%) 

Final  score 

♦16 

•  68%  chance  of  peptic  ulcer 

>■  V-  1 


Slide  6 


199  - 


'J 


•-  '►  n.*'* 


Slide  7 


SYHFTan  SCORES  -  AUXMOL  INDUCED  DYSPEPSIA 


Indicant 

Score 

Cuiulative 

Profaabiiit 

-444 

Male 

104 

-340 

.03 

Single/separatcd 

70 

-270 

.06 

No  atdoainal  pain 

129 

-141 

.20 

Hause*  before  breakfast 

126 

-15 

.46 

Hatching 

75 

60 

.65 

Painless  diarrhoea 

91 

151 

.62 

■cavy  aaoker 

■1 

232 

.91 

1 

JUoohol  intake  -  heavy 

373 

665 

.99 

Slide  8 


DISCUSSION  ON  PRESENTATION  OF  DAVID  SPIECELHALTER 


DR.  DeGROOT:  Thank  you  very  auch,  David.  I  guess  we  will 
follow  the  same  format  as  we  did  yesterday.  I  will  give  Art  Deapster  a 
chance  to  coaaent,  if  you  have  coaaents. 

DR.  DEMPSTER:  Actually,  I  have  been  operating  pretty  auch  in  the 
learning  mode  for  the  last  hour  and  enjoying  ayself.  Of  two  things  1 
Bight  mention,  one  is  from  a  Bayesian  perspective.  A  Bayesian 
perspective  to  ae  more  or  less  includes  belief  functions.  That  they  are 
the  same  kind  of  thing  is  a  reaction  to  GLADYS. 

GLADYS,  froa  reading  the  JRSS  paper  by  David  and  Knill-Jones,  was 
based  on  data  from  1200  patients,  I  believe.  So  it  is  really  a 
statistical  system.  The  kind  of  question  I  wonder  about,  after  thinking 
about  the  AI  approach  to  things,  is  wouldn't  you  have  been  able  to 
create  a  pretty  good  system  without  ever  using  those  1200  patients? 

From  a  Bayesian  perspective,  then,  that  is  using  a  prior. 

Why  has  all  of  this  prior  information  been  left  out  of  the 
picture  totally  and  could  not  one  do  twice  as  good,  and  whatever,  if  you 
used  it? 

The  second  comment  on  a  totally  different  topic  has  to  do  with 
the  narrower  issue,  this  business  of  tree  structures  on  the  diseases 
that  Glenn  and  David  mentioned.  I  was  Just  going  to  mention  that 
Augustine  Kong's  thesis  develops  models  in  the  belief  function 
framework,  which  I  think  can  be  quite  useful  for  pursuing  that  kind  of 
thing.  If  the  chairman  wishes,  I  am  sure  Augustine  could  tell  us  about 
that  for  five  or  ten  minutes. 

DR.  DeGROOT:  I  call  on  Stephen  Watson. 

DR.  WATSON:  I  too  found  this  talk  very  intriguing  and  very 
interesting.  It  is  nice  to  hear  someone  who  has  actually  spent  some 
time  constructing  one  of  these  systems.  As  David  was  talking,  the 
questions  that  were  rising  in  ay  mind  were,  in  constructing  one  of  these 
systems,  that  all  of  the  time  one  needs  to  make  analytical  decisions  in 
the  process  of  constructing  a  model.  Modeling  decisions,  shall  we  say? 

First,  in  constructing  a  aodel,  you  have  to  decide  Just  how  to  do 
it.  Do  you  do  it  this  way  or  do  you  do  it  that  way?  One  of  the 
theories  that  that  subject  does  not  seem  to  have  very  strongly  developed 
is  the  theory  of  how  to  construct  models  which  use  some  of  these  ideas. 
It  is  really  a  question  of  validation.  How  do  you  know  that  some 
particular  system  you  have  constructed  is  a  good  one? 

Now,  I  have  really  no  answers  to  this,  except  to  say  that  there 
seems  to  ae  to  be  four  different  points  that  are  worth  making.  I  would 
value  David's  reaction  as  to  whether  he  actually  did  use  any  or  all  of 
these  four  principles  in  validation  in  constructing  this  system.  Or  if 
he  feels  they  are  used  in  the  construction  of  a  similar  system. 


First,  the  faithfulness  to  a  nomstive  principle,  for  exaaple, 
probability.  David  said  yesterday  with  soae  force  that  probability  is 
the  only  way  of  handling  uncertainty  and  if  this  is  the  case  and  we  are 
constructing  an  expert  systea  which  is  supposed  to  reflect  uncertainty, 
then  the  question  we  want  to  ask  of  that  systea  is  how  well  is  it  using 
probability  theory. 

And  if  it  is  not  using  it,  is  it  Just  not  using  it  because  the 
probability  theory  is  too  difficult  and  you  ar-e  having  to  find  soae 
approxiaation  to  it,  or  is  it  actually  going  against  the  principles  of 
probability  theory  in  a  way  that  is  unacceptable? 

Calibration  is  one  David  did  aention.  I  was  interested  to  see 
that  he  had  done  this.  This  is  soaething  that  has  coae  to  ay  aind  and  I 
was  interested  to  see  that  in  this  particular  case  one  was  able  to  say 
this  aodel  is  a  well-calibrated  assessor  of  probability.  To  the  extent 
that  you  believe  in  probability  it  is  a  good  thing  aany  expert  systeas 
are  not  designed  to  produce  a  probability.  They  are  designed  to  aake 
decisions  and  actually  affect  control. 

Now,  in  that  case,  you  can't  use  calibration  so  obviously  as  a 
criterion  into  validation.  But  in  a  systea  which  is  designed  to  produce 
a  probability,  you  obviously  can. 

User  satisfaction  is  another  criterion  we  use  in  validation.  1 
find  it  a  very  difficult  one  to  go  along  with.  Because  how  do  we  know 
that  the  user  is  right  to  be  satisfied?  The  danger  is  that  we  are 
selling  hia  fliaflaa  or  packaging  and  we  are  not  selling  hia  anything 
which  actually  does  anything  for  hia. 

There  are  lots  of  techniques  that  people  peddle.  Soaetiaes  I 
fear  that  they  are  peddling  then  well  and  get  a  lot  of  user  satisfaction 
because  they  are  good  aarketing  aen,  not  because  they  have  got  a  good 
product.  So  I  think  user  satisfaction  is,  for  validating,  soaething 
that  you  aust  treat  very  carefully. 

I  felt  that  David  was  actually  using  that  quite  a  bit  in  what  he 
was  saying  about  the  clinicians  needed  to  have  certain  qualities  in 
their  aia  before  they  would  be  prepared  to  use  it.  The  user  had  to  be 
satisfied  they  were  doing  soaething.  Perhaps  user  satisfaction  has  to 
be  gone  along  with  only  insofar  in  doing  so  we  don't  go  against  soae  of 
the  other  principles  of  validation. 

I  suppose  the  aost  obvious  one  in  Judging  whether  an  expert 
systea  is  good  is  to  see  if  it  coapares  well  with  expert  perforaance. 
Now,  here  again  I  think  there  are  two  views  in  the  literature  on  expert 
systeas.  Either  you  are  trying  to  construct  soaething  that  does  it  well 
or  as  best  huaan  around,  or  you  say  the  best  huaans  around  don't  do  it 
very  well  and  we  ought  to  construct  soaething  which  is  better  than  the 
best  huaans  around. 


If  the  latter  position  is  taken  you  don't  want  it  to  be  siailar 
to  expert  perforaance.  You  want  it  to  be  perhaps  faithful  to  the 
noraative  principle.  These  are  soae  ideas  and  I  would  like  David's 
response  to  thea. 

DR.  DeCROOT:  Okay,  David,  respond. 

DR.  SPIECELHALTER:  First  of  all,  the  thing  of  why  do  we  use 
prior  inforaatlon.  You  are  quite  right,  we  nearly  broke  their  hearts 
back  in  the  states.  The  project  nearly  collapsed  and  I  did  not  coae 
into  it  until  just  about  the  end  of  their  data  collection  exercise. 

They  ran  out  of  grant  aoney  and  it  was  a  terrible  waste  not  to  have 
soaething  worked  out.  1  aean  they  started  a  long  tiae  ago,  since  so 
aany  of  the  techniques  that  I  have  been  using  are  direct  rlpoffs  froa  AI 
stuff  that  I  have  only  just  coae  across,  that  1  don't  think  we  could 
have  built  this  thing  five  years  ago. 

But,  yes,  certainly  the  aost  sensible  thing  in  designing  a  systea 
like  this  would  be  to  start  off  using  eoapletely  subjective  opinion  and 
then  update  it  in  proper  Bayesian  ways  as  data  coaes  along.  We  are 
trying  to  do  that  with  one,  working  in  chest  diseases  at  Westainlster 
Hospital,  with  a  systea  for  diagnosis  for  asthaa,  et  cetera,  to  be  used 
as  an  initial  test  to  decide  about  further  investigations.  And  there  we 
have  got  the  clinician  assessing  quite  a  large  nuaber  of  subjective 
probabilities  and  particular  findings,  the  probability  of  yellow  sputua 
in  a  aale  in  this  age  group,  with  shortness  of  breath,  and  no  chest 
pain,  no  coughing  up  blood  and  asthaa. 

So  we  look  at  fairly  restricted  disease  groups  and  ask  thea  for 
these  subjective  probabilities.  Now,  quite  reasonably  he  does  not  aind 
being  very  precise  about  these  and  as  people  have  said,  I  think  the 
nuaber  if  fairly  high,  well  — 

DR.  WISE:  I  aa  sorry  to  juap  ahead,  but  when  it  says  SO  to  90 
percent,  is  that  one  standard  deviation  or  two? 

DR.  SPIECELHALTER:  In  the  questioning  —  I  aean,  we  generally 
take  it  so  if  it  is  outside  that,  he  says  he  would  be  pretty  surprised. 
We  try  to  get  thea  so  we  take  thea  as  being  about  one  in  20  charts  and 
so  we  will  aake  thea,  give  an  interval  for  which  they  would  think  they 
are  95  percent  sure.  Now,  whether  that  95  percent  aeans  95  la  a 
different  subject.  So  we  interpret  that,  when  he  gives  a  range,  we 
interpret  that  as  a  95  percent  credible  interval,  or  whatever,  based  on 
an  laaginary  saaple.  That  in  fact  corresponds  to  the  interval  that  you 
would  obtain  were  you  to  observe  an  laaginary  saaple  of  20  asthaa 
patients,  of  whoa  1A  had  the  syaptoes. 

The  nuaber  is  stored  in  the  systea,  not  as  a  probability  but  as  a 
fraction.  It  is  stored  as  lA  out  of  20,  so  that  when  new  data  coaes 
along  and,  say,  we  observe  another  ten  real  patients  in  this  group,  nine 
of  whoa  have  got  the  syaptoa,  then  we  can  update  that  nuaber,  that 
fraction  into  a  new  fraction  and  so  update  the  probability. 


DR.  WISE:  But  if  that  foraula  ia  right,  you  art  laplleltly 
aasualng  that  it  is  a  data  distribution. 

DR.  SPIECELHALTER:  Hall,  no,  thara  ara,  you  can  updata  using 
■aans,  saapls  aaans.  All  right,  a  data  distribution  would  giva  that, 
but  you  can  do  updating  with  thasa  coabinations  of  aaan,  priaa  aaans  and 
saapla  aaans  undar  auch  aora  ganaral  assuaptions  than  data  distribution, 
but  so  who  caras?  It  is  Just  a  rough  idaa. 

Tha  Joy  of  doing  this  so  wa  can  actually  gat  a  systaa  off  tha 
ground  within  a  aonth's  work  —  okay,  it  is  a  lot  of  arguaant;  thay  sat 
around  and  bickarad  about  thasa  nuabars  a  lot,  but  than  you  actually  gat 
thaa  going  and  you  ara  updating  it  and  that  is  Just  tha  right  way  to  do 
it. 

DR.  WISE:  Hhan  you  gat  thasa  nuabars,  how  aany  clinic  clinicians 
do  you  intarviaw? 

DR.  SPIEGELHALTER:  That  was  Just  based  on  a  ooupla  of  people. 

DR.  HISE:  And  you  sake  than  giva  you  one  range. 

DR.  SPIEGELHALTER:  Ha  did  not  do  a  Delphi  technique  on  that,  but 
Just  sat  down  and  argued  about  it.  So  thay  ara  pretty  crude  things  to 
gat  it  off  tha  ground,  which  wa  hope  that  tha  data  is  going  to  be 
sufficient. 

DR.  DEMPSTER:  On  the  Other  hand,  David,  this  trial  that  you 
aentionad  about  tha  16,000  people,  that  should  produce  wonderful  data. 

DR.  SPIEGELHALTER:  No,  thay  refused  to  put  the  syapton  data.  It 
was  all  wasted.  It  was  due  to  tha  organizations  in  tan  hospitals  and 
all  of  the  stuff  is  being  punched  onto  aicros.  Tha  syB^^oB  has  bean 
punched  on,  but  it  is  not  being  kept.  So  that  was  a  tragic  waste. 

And  Just  to  answer  Stephan,  I  agree  user  satisfaction  is 
iaportant  and  our  design  is  changing  always  as  tha  criticises  ara  aada 
of  it.  In  taras  of  the  evaluation,  whether  you  ara  trying  to  gat  right 
or  whether  you  ara  trying  to  gat  batter  than  tha  next  one,  tha  final 
thing  is  that  tha  evaluation  does  not  affect  patient  care.  Probably 
this  is  aost  iaportant. 

Thara  is  now  bacoaing  fairly  established  a  four-stage  evaluation 
procedure.  It  is  following  alaost  exactly  the  saaa  pattern  as  tha  user 
trials,  where  you  start  with  such  as  initial  safety,  and  than  eventually 
to  stage  three  direct  trials  and  do  a  control  trial.  Each  of  these 
things  is  iaportant  and  can  be  tackled  as  separata  evaluation  systaas. 

DR.  DaOiOOT:  Thank  you,  David.  Ara  thara  coaaants,  questions 
froB  tha  floor? 


OR.  BROHNSTON:  I  waa  atruck  by  an  analogy  batwaan  what  you  ara 
doing  and  aoaathing  which  ia  quita  old  faahionad  and  that  ia 
payehological  taating.  Thara  ara  a  lot  of  diffarant  waya  of  looking  at 
paychological  taating.  Ona  typa  of  taating  ia  vary  mieh  lika  axpart 
ayataaa,  lika  R-1  or  Haxiaa  or  Oandral,  which  ara  quita  datarainiatie. 
Tha  analogy  in  taating  would  ba  to  axhauativaly  aak  tha  atudanta  tha 
■ultiplication  tabla  and  saa  if  thay  know  tha  Bultlplicatlon  tabla;  or 
giva  than  cartain  things*  lika  can  thay  intagrata  trigonoaatric 
functions  by  giving  thaa  trigonoaatric  functions  intagratad. 

In  that  casa  thara  is  raally  no  uneartalnty  involvad  and  you  gat 
a  datarainiatie  answer.  Another  step  up  would  ba  in  personnel  tasting 
or  aptitude  tasting,  where  your  psychological  test  is  soaewhat  lika  an 
axpart  systaa  to  do  an  interview.  You  ask  than  a  saapla  of  questions 
and  on  the  basis  of  the  answers  to  these  questions  you  detamine  whether 
this  person  should  get  this  Job,  or  should  get  adaitted  to  a  university, 
and  so  on. 

There  has  to  be  technology  for  deterainlng  whether  this 
psychological  test  is  doing  what  you  expect  it  to  do.  There  has  to  ba 
external  validation.  So  you  have  to  do  follow-up  studies  to  detaraina 
whether  upon  using  this  test  you  get  a  higher  proportion  of  successes  in 
the  adnlsslon  procedure  than  if  you  did  not  use  this  test.  This  is  what 
is  called  validation. 

But  there  are  other  techniques  for  detarnlning  coherence  which 
are  called  reliability  teats.  In  this  case  it's  the  validation  which  is 
slailar  to  calibration,  in  the  sense  you  are  trying  to  aaxiaixe  tha 
nuaber  of  correct  decisions.  Than  thara  is  also  actually  an  analogy  to 
user  satisfaction,  and  as  Stephen  Watson  pointed  out,  it  is  called  based 
validity,  which  is  considered  to  ba  public  relations  in  tasting. 

You  have  to  put  in  enough  questions  to  sake  it  look  like  you  are 
testing  what  you  are  supposed  to  be  testing,  even  if  you  can  deteraine  a 
person's  success  in  college  by  asking  then  if  they  like  eating  raw 
carrots.  There  is  a  test  which  asks  irrelevant  questions  like  that, 
which  happens  to  be  valid. 

So  this  brings  up  all  sorts  of  interesting  questions  about  the 
nature  of  building  expert  systems,  especially  about  validation.  I  think 
validation  is  ona  of  the  aost  laportant  things.  When  you  can  validate, 
you  Bust  validate  to  determine  if  your  systaa  is  doing  that.  Then  after 
these  two  uses  of  tests,  there  is  a  third  use  of  psychological  tests  and 
that  is  personality  tasting.  Where  you  are  not  even  sure  whether 
introversion,  extroversion  is  really  a  valid  diaension  of  personality, 
but  you  are  using  the  test  in  an  exploratory  fashion. 

Just  so  in  some  domains  the  question  of  validation  is  very 
difficult.  One  of  thaa  would  be  ailitary  decision  asking,  because  you 
don't  have  a  sat  of  observed  fraquancias.  You  only  have  axpart 
Judgaants  that  you  can  use.  So  what  your  axpart  systaa  is  doing  is 
trying  to  siaulata  what  tha  axpart  knows.  In  fact,  it  is  perhaps  even 
doing  psychology  in  trying  to  do  that. 


So  this  Is  Bors  fuzzy.  Ths  progrsssion  I  was  giving  was 
soMthing  vary  dstarainistie  to  soasthlng  probabilistic  to  soasthing 
which  wa  raally  don't  know  iduit  wa  ara  doing.  And  I  think  that 
prograssion  goas  into  both  psychological  tasting  and  in  what  you  ara 
doing.  I  an  glad  to  saa  that  you  ara  solving  problaas  in  vary  siailar 
ways  to  tha  way  that  paopla  in  psychological  Baasuraaent  can  solve  than. 

DR.  SPIECELHALTER:  Yas.  tha  parallal  is  incoBplata  thara.  and  wa 
ara  intarastad  in  working  thara.  Thara  ara  axhaustiva  —  and  in  a  sansa 
this  can  ba  viawad  as  a  way  of  trying  to  avoid  doing  axhaustiva 
datarBinistic  tasting,  you  could  find  out  what  is  wrong  with  thsB.  This 
is  trying  to  avoid  doing  that,  instead  of  asking  paopla  questions; 
irrelevant  questions,  you  ask  a  few  of  than  and  try  to  Judge  essentially 
how  they  would  have  answered  tha  rest. 

Tha  final  area  about  tha  ill-dafinad  central  final  outcoBas  is 
vary  widespread  and  soBcthing  I  try  to  avoid.  Our  disease  categories 
ara  well  defined,  but  is  is  vary  coaplicatad. 

DR.  ZADEH:  I  found  this  to  be  a  vary  inprasslva  piece  of  work 
even  though  thara  ara  a  great  deal  of  ad  hoc  procedures  of  one  kind  or 
another.  I  think  these  procedures  are  basically  unavoidable.  In  other 
words,  you  Just  can't  use  foraal  hearing  in  this  sort  of  application. 

I  think  there  is  one  probleB  with  approaches  of  this  kind  and 
that  is  that  you  Bight  get  drowned  in  an  ocean  of  InforBation.  For 
exsBple,  we  have  this  ten~page  questionnaire.  The  problan  is,  with 
things  of  this  kind,  that  anyone  of  the  answers  in  that  questionnaire  by 
itself  will  not  be  decisive.  It  is  a  little  bit  like  the  following: 
Suppose  you  want  to  decide  on  whether  or  not  to  proBote  an  assistant 
professor  to  associate  professor.  Instead  of  asking  a  few  key  people, 
you  ask  the  students  and  the  secretaries  and  people  not  in  the 
departnent  and  you  get  10,000  opinions  as  to  whether  the  Ban  should  be 
proiBOted  or  not.  Hell,  the  10,000  opinions  of  that  kind  when  aggregated 
together  would  be  Buch  less  reliable  here  than  the  few  key  assistants. 

One  of  the  problens  that  plays  an  important  role  is  the  issue  of 
control  strategy.  What  question  do  you  ask  next,  because  depending  on 
the  hypothesis  that  you  are  sort  of  converging  on,  a  set  of  questions 
Bay  or  Bay  not  be  relevant.  The  uncertainty  in  soae  of  these  systems, 
the  Bore  sophisticated  ones,  is  used  to  determine  what  question  to  ask 
next.  This  is  the  way  a  doctor  would  proceed. 

The  way  a  doctor  would  proceed  would  be  very  much  influenced  by  a 
tentative  hypothesis  that  is  being  foraed  in  the  doctor's  Bind  and  the 
perception  of  what  questions  would  be  central  to  that  hypothesis. 

There  was  a  little  bit  of  Bentlon  of  that  thing  in  the  branching 
questionnaire.  In  the  case  of  this  last  thing  when  you  said,  when  you 
try  to  define  Ignorance,  you  say,  okay,  we  will  try  to  compute  the 
probability  as  a  function  of  the  possible  answer  to  the  question.  To  Be 
this  is  totally  an  iBpossible  enterprise.  In  other  words,  it  would  be  a 
Bind  boggling  exercise.  Even  in  the  case  of  closed  systems  you  would 
qualify  that.  The  possibility  of  different  questions  to  different 


answers  and  the  iapact  that  this  say  have  on  probabilities  would  be  such 
the  answer  necessarily  would  be  that  the  probability  would  be  zero  and 
one  and  the  expected  value  is  .5.  It  seeas  to  ae  that  no  other  answer 
really  could  be  obtained  to  a  question  like  that. 

In  other  words,  if  you  have  the  wrong  questionnaire  and  you  are 
trying  to  coapute  the  probabilities  as  a  function  of  the  possible 
answers  to  this,  I  don't  see  anything  else  in  there.  I  don't  know  if 
you  have  actually  done  it  or  not,  but  it  seeas  to  ae  that  if  you  have 
not  done  that  and  if  you  would  do  it,  this  is  what  you  would  be  arriving 
at  in  total  ignorance. 

DR.  DeGROOT:  David,  do  you  want  to  respond? 

DR.  SPIEGELHALTER:  Yes,  there  are  a  nuaber  of  things.  First  of 
all,  in  GLADYS  we  consider  the  patient's  tiae  is  free.  It  is  costless 
and  so  they  have  to  answer  a  lot.  They  get  set  down  in  front  of  the 
screen  and  have  to  sit  there  for  a  half  hour  or  soaething  like  that.  So 
there  la  not  the  need  for  the  control  strategy. 

In  the  other  systea  I  was  talking  about,  the  one  for  general 
practice,  then  it  is  a  very  Halted  nuaber  of  questions  and  then  there 
is  the  need  for  a  very  stringent  control  and  Identifying  very  laportant 
questions,  so  that  should  be  Incorporated.  The  idea  of  can  one  actually 
do  the  search  and  work  out  the  possible  values  that  a  probability  could 
take  on,  this  one  is  quite  reasonable  that  if  you  ask  enough  questions 
you  will  get  as  close  as  you  need  to  zero  and  one.  However,  if  one  can 
do  actual  distribution  and  you  don't  pull  in  on  the  range,  then  still 
that  distribution  can  tell  you  what  your  ignorance  is  about  a  particular 
question.  It  shows  you  how  sensitive  your  current  belief  is  to  further 
inforaatlon  that  you  could  obtain. 

The  actual  computational  difficulties  are  difficult  and  this  is 
what  I  an  working  on  with  these  people  in  IMMEDIATE.  They  think  they 
can  do  the  search  through  in  order  to  generate  at  least  range.  And 
clearly  if  you  generate  the  range  it  is  zero  to  one  but  it  is  useless, 
as  you  say.  In  which  case  you  really  want  to  get  the  whole 
distribution.  It  is  reasonable  that  any  state  in  an  expert  system  can 
generate  plausible  further  findings  it  is  going  to  have. 

DR.  YAGER:  I  Just  want  to  say  something  about  the  validation.  I 
agree  it  is  a  very  important  issue.  It  seems  to  me  that  there  are  at 
least  two  considerations  one  has  to  have.  One  is  how  truthful  the 
system  is.  how  good  it  predicts  the  right  answer. 

For  example,  if  you  have  a  weather  forecast,  in  an  expert  system 
it  sort  of  predicts  a  high  temperature  tomorrow.  So  one  consideration 
is  the  issue  of  whether  it  does  indeed  predict  the  right  temperature 
tomorrow.  But  a  second  consideration,  a  very,  very  important  one,  is 
how  specific  your  answer  was  in  the  sense  that  if  you  have  an  expert 
system  that  predicts  that  the  weather  will  be,  let's  say  over  30  degrees 
tomorrow,  that  is  his  prediction,  it  is  very  unspecific.  It  will  always 
be  correct  but  it  won't  be  that  informative. 


207  - 


_ IT  » .•  •  ■  V' 


'  'V  'S'.V!  .'J 


So  1  think  for  validation  you  have  to  consider  two  factors  in  the 
sense  they  contradict  each  other.  The  correctness  of  the  answer  as  well 
as  the  specificity  of  the  answer,  if  you  want  a  specific  answer.  If  the 
answer  is  not  very  specific,  even  though  it  is  correct,  it  is  not 
useful.  I  think  you  have  to  always  consider  those  two  conditions  and 
they  sort  of  fight  with  each  other. 

DR.  SPIEGELHALTER:  That  can  be  done  formally.  If  your 
prediction  is  done  in  terms  of  distribution,  then  one  can  use  a  scoring 
rule  that  is  related  to  the  order  of  that  distribution  at  the  true 
value,  and  so  it- is  way  out  in  the  tail.  It  may  be  a  value  you  have 
given  some  support  to,  but  it  is  way  out  in  the  tail  and  there  are  lots 
of  other  values  you  consider  much  more  likely.  So  it  is  pretty  useless. 

DR.  DeGROOT:  It  is  time  for  a  coffee  break.  I  thank  you  again, 

David. 


(Recess) 

DR.  DeGROOT:  Let's  resume  the  discussion  that  we  were  having. 

If  there  are  questions  pertaining  to  David  Spiegelhalter 's  talk,  we  will 
welcome  them  now.  Also,  it  would  be  nice  to  hear  from  others  who  have 
had  experience  with  particular  expert  systems.  I  would  be  very 
interested  in  hearing  experience  as  to  the  practicality  of  using  the 
different  methods  that  we  have  been  discussing  in  real  live  expert 
systems,  the  practically  of  implementing  Bayesiam  methods  or  fuzzy 
methods  or  belief  function  methods  and  so  on. 

If  you  have  such  comments,  experiences,  even  if  they  are  not  tied 
directly  to  questions  or  the  particular  talks,  I  am  sure  there  are  many 
of  us  here  who  would  like  to  hear  about  those,  so  don't  be  bashful  about 
telling  them  to  us. 

DR.  SINGPURWALLA :  This  is  really  not  a  question,  but  more  of  a 
comment.  It  seems  to  be  coming  up  over  and  over  again  and  I  think  it 
pertains  to  this  hierarchy  that  David  showed  this  morning.  I  am 
constantly  reminded  of  fault  trees  and  event  trees.  The  Important  point 
that  I  want  to  make  is  that  you  mentioned  the  notion  of  importance 
somewhere  along  the  way,  and  you  said  it  was  wavy,  heuristic,  and  so 
forth. 


In  fault  tree  analysis,  we  do  have  the  notion  of  importance, 
which  is  mathematically  precise.  There  are  two  measures  of  importance. 
One  is  what  we  call  structural  Importance  and  the  other  is  what  we  call 
reliability  importance;  the  latter  is  a  probabilistic  notion.  I  am 
suggesting  that  these  kind  of  notions  be  considered  in  the  context  that 
you  are  interested  in,  and  you  may  find  them  useful. 

You  can  look  at  a  certain  node  and  Judge  its  importance  based  on 
its  structural  position  and  also  based  on  the  probabilities  that  you  are 
willing  to  assign  to  it.  I  believe  Professor  Zadeh  was  also  referring 
to  the  question  of  importance. 


-  208  - 


The  second  point  is  that  somewhere  you  mentioned  the  inadequacy 
of  probability  theory.  You  cited  two  situations.  One  pertains  to 
outcomes  which  are  non-binary.  Is  that  right? 

DR.  SPIEGELHALTER:  I  meant  Just  ill-defined  propositions. 

DR.  SINGPURWALLA:  All  right,  I  won’t  pursue  the  non-binary  issue 
for  now.  The  second  issue  you  mentioned  were  questions  of  linearity, 
normality  of  form  and  things  like  that.  I  believe  that  models  are 
personal  expressions  and  the  normality,  if  ever,  in  a  linear  model  is 
something  that  you  as  the  modeler  subjectively  specify;  and  there  is 
nothing  about  its  truth  or  falsity. 

DR.  SPIEGELHALTER:  Yes,  it  is  an  idea  of  assuming  something,  but 
in  order  to  assume  normality  within  a  system  one  wants  to  get  some  idea, 
is  there  evidence  against  it.  which  suggests  some  sort  of  known 
probability. 

I  would  not  like  to  talk  about  the  probability  of  normality.  I 
don't  really  feel  like  I  knew  what  I  was  talking  about  at  that  point  and 
so  I  am  prepared  to  see  that  there  may  be  areas  in  terms  of  control 
strategies  where  you  do  want  to  make  assumptions  where  some  strictly 
non-probabilistlc  measures  have  evidential  support  or  something  might  be 
used. 

DR.  SINGPURWALLA:  Getting  back  to  this  issue  of  normality  and 
linearity  of  errors  in  linear  models,  the  sample  theory  approach  to 
these  issues  would  be  the  analysis  of  residues,  and  wouldn't  that  still 
be  within  the  framework  of  probability? 

DR.  SPIEGELHALTER:  No,  it  would  be  in  the  framework  of  tail 
areas  which  is  not  in  the  strict  probabilistic  range.  You  can  make  some 
judgment  based  on  the  tail  area  in  the  distribution  consisting  of 
hypothesis,  which  is  not  saying  anything  about  the  probability  of  the 
hypothesis.  So  that  is,  strictly  speaking,  nonprobabilistic  reasoning. 

DR.  SINGPURWALLA:  The  third  comment  I  think  is  a  more  general 
one  and  I  don't  know  what  the  answer  is,  but  some  mathematicians  in  this 
audience  could  probably  answer  it  a  little  bit  more  precisely.  How  much 
of  our  knowledge  of  mathematics  is  based  on  the  notion  of  binary 
variables? 

If  much  of  it  is,  then  arguments  against  the  use  of  probability 
theory  essentially  would  not  in  any  way  fill  that  gap.  Everything  that 
we  can  think  about  is  in  binary  terms  and  I  would  think  probability 
theory  is  adequate  to  deal  with  it. 

DR.  SPIEGELHALTER:  Again,  it  is  ideas  of  imprecision  of  the  fact 
that  something  could  be  probably  true. 


DR.  YAGER:  It  s««BS  to  ae  It  is  sort  of  s  Boot  qusstlon  in  s 
way.  because  if  everything  relates  on  binary  then  all  of  arithaetic  —  I 
aean  then  why  do  any  arithaetic  other  than  with  ones  and  zeros.  Hhy 
introduce  the  nuabers  three,  five,  seven  and  so  forth?  Hay be  you  can  do 
everything,  I  aa  not  sure,  but  aaybe  you  can  do  everything  in  binary. 

But  the  point  of  the  aatter  is  that  is  not  the  aost  effective  way 
to  think,  or  to  coaaunlcate,  or  to  do  aanipulations.  So  I  don't  think 
-it  really  aatters. 

OR.  SINGPURWALLA:  Three,  four  is  a  build  up  on  the  binary 
systea,  so  I  don't  think  your  counter  exaaple  is  that  good. 

DR.  WISE:  It  seeiBs  to  ae  that  your  objective  is  better  aet  by 
the  exaaple  of  real  nuabers,  which  they  have  not  even,  their  cardinality 
does  not  even  correspond  to  binary  things.  But  you  can  talk  about  real 
nuabers  with  propositions  which  are  theaselves  either  true  or  false. 
Every  foraula  you  write,  plus  a  aet  theory,  is  a  proposition  which  is 
either  true  or  false,  when  you  talk  about  the  probability  that  a  real 
nuaber  falls  in  an  interval,  or  in  another  real  interval,  or  another. 

So  in  one  way  you  are  handling  continuous  things  very  easily  but  you  are 
working  only  those  propositions  which  are  true  or  false  and  you  are 
talking  about  probability  theory. 

DR.  SINGPURWALLA:  You  have  not  told  ae  that  there  is  a 
aulti-valued  logic. 

DR.  WISE:  There  are  those  too,  but  those  are  the  propositions. 

DR.  SINGPURWALLA:  You  Just  have  two. 

DR.  WISE:  But  those  are  propositions.  In  aulti-valued  logic 
you  have  Just  got  different  propositions  which  are  theaselves  either 
true  or  false.  They  have  no  value.  They  aay  have  a  value  true  or  false 
or  unknown  one  or  unknown  two,  but  they  either  have  that  value  or  they 
don't  have  that  value. 

DR.  YAGER:  But  you  could  go  on  ad  infinitua  and  you  can  add  your 
aulti-value  logic  on. 

DR.  WISE:  Sure,  aath  gets  coapllcated. 

DR.  ZADEH:  There  is  a  aore  basic  issue  of  that  that  calls  into 
question  this  kind  of  physical  analysis.  That  Is  aost  of  the  events, 
aost  of  the  propositions  here,  are  really  fuzzy  events.  I  think  you 
will  adalt  that  when  you  talk  about  high  fever  or  hardening  of  the 
arteries  and  when  you  talk  about  having  gallstones.  All  of  these  are 
fuzzy  events.  In  other  words,  this  particular  disease  could  be  present 
to  a  degree.  Now  soae  are  aore  that  way  than  others. 


Infectious  diseases  tend  to  be  sort  of  yes  or  no  types.  Either 
you  have  tuberculosis  or  you  don't  have  it.  But  degenerative  diseases 
are  not  like  that.  They  are  generally  a  eat ter  of  degree.  Even  in  the 
case  of  pain,  you  have  severe  pain  and  you  have  frequent  pain,  but  how 
frequent?  If  you  are  confronted  with  a  particular  situation  and  you  ask 
and  say  is  it  frequent  pain  or  infrequent  pain,  then  it  becoaes  soaewhat 
artificial  to  force  the  patient  to  say  yes  or  no.  Because  this  is  not  a 
natural  thing. 

I  think  in  your  presentation  you  said  we  sort  of  treat  it  as  soae 
sort  of  a  proposition,  but  strictly  speaking  this  is  not  really  valid. 

It  is'  not  really  valid  and  furtheraore  when  it  coaes  to  assessing  the 
probabilities  then  you  have  to  be  able  to  coae  up  with  the  concept  of 
cardinality.  In  other  %>ords,  you  count  the  nuaber  of  patients  who  have 
hardening  of  the  arteries  in  the  presence  of  certain  other  conditions. 
But  if  hardening  of  the  arteries  is  a  aatter  of  degree,  then  how  do  you 
tell  this  particular  individual  who  has  hardening  of  the  arteries  that 
it  is  .3  or  .7,  .9,  so  that  you  have  a  succession  of  cases  but  each  one 
of  these  cases  is  sort  of  a  aatter  of  degree. 

So  that  strictly  speaking  none  of  these  probabilities  or  very  few 
of  them  can  be  assessed  in  classical  probability  teras,  because  you  are 
not  really  dealing  with  crisp  events.  So  because  of  this  the  classical 
things  that  we  take  for  granted  are  not  valid.  For  exaaple,  the 
standard  thing  that  is  the  case  and  let's  use  expert  systeas  in  MYCIN 
pathology  (inaudible)  is  that  the  conditional  probability  of  A  given  B 
is  equal  to  one  ainus  the  conditional  probability  of  not  A  given  B. 

That  foraula  is  not  valid. 

That  is  if  you  assuae  that  A  and  B  are  fuzzy  predicates,  if  you 
assuae  that  they  correspond  to  these  things  like  high  fever,  hardening 
of  the  arteries  and  so  forth,  that  is  not  valid.  So  you  have  an 
iaaediate  breakdown,  an  iaaediate  breakdown.  You  don't  have  to  go  far. 

The  rules  of  aodus  ponens  break  down.  Now  you  try  to  patch  that 
up,  and  that  is  what  is  done  in  MYCIN  a  little  bit.  You  set  soae  sort 
of  a  threshold,  but  these  are  highly  unsatisfactory  ways  of  coaing  to 
grips  with  these  issues.  What  1  aa  trying  to  say  is  aost  people,  and 
that  applies  to  all  of  us,  use  whatever  techniques  they  feel  coafortable 
with  and  they  tend  to  be  skeptical  of  techniques  that  they  are 
unfaailiar  with.  This  is  a  very  natural  sort  of  a  thing. 

But  at  the  saae  tiae  I  think  that  one  has  to  consider  the  fact 
that  in  probleas  of  the  order  of  complexity  that  David  has  described, 
classical  probability  techniques  can  be  used  only  in  special  situations 
to  a  Halted  extent.  Beyond  that,  it  becoaes  a  aatter  of  closing  your 
eyes  and  all  sorts  of  assuaptions  aaking  by  the  dozens  all  sorts  of 
approx iaat ions,  disregarding  dependencies,  disregarding  the  type  of 
things  that  overlap,  disregarding  the  fact  that  they  don't  have  sharp 
boundaries.  All  of  these  things  are  disregarded. 


So  the  question  that  arises  is  how  such  reliance  can  be  put  on 
the  nuoerical  probability  at  the  very  end,  like  .25?  My  contention  is 
very  little.  It  is  a  label  for  what  is  in  effect  a  ball  park  figure, 
like  low.  That  is  the  aost  that  you  can  say  and  anything  beyond  that  is 
unrealistic.  True,  people  use  it  and  it  is  useful.  That  does  not  aean 
it  is  not  useful.  But  at  the  same  tlM  we  have  to  have  our  eyes  open. 

We  have  to  realize  that  we  will  be  deluding  ourselves  if  we  took 
those  figures  seriously.  Just  as  the  figure  of  .2  with  a  probability 
that  Shakespeare  wrote  Haalet  cannot  be  taken  seriously.  It  cannot  be. 
So  the  point  that  I  was  trying  to  aake  in  ay  own  presentation  and  this 
is  by  nature  of  coeaent  on  this  thing,  it  is  not  a  matter  of  using  one 
technique  versus  another  technique.  It  is  a  natter  of  trying  to  find  an 
acconnodation  with  a  very  pervasive  imprecision. 

In  fact,  I  think  in  the  field  of  medical  diagnosis  it  is  at  this 
point  far  too  complex  in  relation  to  the  understanding  of  these  issues 
that  we  have.  We  have  ventured  far  beyond  what  we  know  about  reasoning 
under  uncertainty,  imprecision,  the  issue  of  what  questions  to  ask,  the 
issue  of  what  tests  to  perform,  considering  the  fact  that  these  tests 
have  certain  risks  associated  with  them  and  so  forth.  So  that  strictly 
speaking,  the  level  of  our  knowledge  at  this  point  sufficles  or  experts 
systems  in  very  narrow,  specialized  domains  in  which  we  don't  have  the 
risk  of  the  kind  that  you  have  in  medical  systems. 

Now,  it  does  not  mean  that  we  should  not  do  that.  I  think,  as  I 
mentioned  originally,  I  was  very  much  impressed  by  the  system  that  was 
developed.  It  is  a  useful  system.  It  is  an  effective  system.  It  may 
have  some  flaws  here  and  there,  but  it  is  a  working  system. 

OR.  DeGROOT:  What  system  are  you  referring  to  now? 

DR.  ZADEH:  This  system  that  David  described.  And  the  same 
applies  to  other  systems,  like  MYCIN,  so  whatever  criticisms  you  make  of 
those  systems  does  not  mean  that  they  are  not  useful  systems.  It  means 
merely  that  we  cannot  merely  Justify  all  of  these  things  in  terms  of 
formal  theories.  That  is  all  it  means.  So  that  what  we  could  Justify 
at  this  point,  as  I  said  earlier,  would  be  things  that  would  be  far  less 
ambitious.  That  la  my  feeling. 

DR.  DeGROOT:  Thank  you.  That  was  very  clearly  stated.  David, 
do  you  have  a  comment? 

DR.  SPIEGELHALTER:  I  agree  that  if  one  was  genuinely  trying  to 
say,  was  using  phrases  like  the  patient  sometimes  wakes  at  night  with 
pain  and  the  pain  is  very  severe  and  from  that  you  are  trying  to 
conclude  a  statement  like  he  has  a  duodenal  ulcer,  that  is  a  very  fuzzy 
statement,  degrees  and  degrees  of  it.  Then  the  calculus  of  probability 
perhaps  would  be  unreaonable  and  you  are  talking  about  ill-defined 
statements. 


The  argument  against  that  I  used  before  is  to  say  that  that  is 
not  what  we  are  dealing  with.  We  are  in  a  sense  trying  to  make  it 
amenable  to  probability  and  to  admit  that  you  have  to  do  this  to  make  it 
amenable  to  probability.  We  are  crisplfying  the  statements  by  saying 
that  everything  up  there  was  a  shorthand  for  when  forced  to  say  yes  or 
no  to  the  question  is  the  pain  —  we  don't  use  phrases  like  is  the  pain 
severe,  but  when  forced  to  say  yes  or  no  to  the  question  do  you  get 
early  repletion  when  you  eat  a  meal  and  the  patient  answers  yes,  and 
from  that  you  conclude  the  probability  that  it  will  be  Concluded  at  six 
months  when  forced  to  say  yes  or  no  that  the  patient  has  a  duodenal 
ulcer,  the  doctors  will  say  that  he  has  a  duodenal  ulcer. 

So  we  acknowledge  that  if  we  did  not  make  this,  put  their  backs 
against  the  wall  and  make  the  patient  sit  down  in  front  of  a  terminal 
and  Ruke  the  doctor  fill  in  one  box  in  the  form  about  the  final 
diagnosis,  then  calculus  might  be  unreasonable  and  frankly  I  wouldn't 
really  know  what  to  do,  because  I  wouldn't  really  know  what  I  was 
talking  about  at  that  point. 

But  by  doing  this,  by  forcing  them  physically  to  answer  yes  or  no 
to  a  question  we  are  crisplfying  it.  Therefore  I  feel  that  the 
probability  is  not  invalid  and  that  our  statements  are  well-defined  and 
the  numbers  are  well-defined  and  have  an  operational  meaning.  But  to 
Justify  that  one  has  to  see  everything  that  is  written  down  as  a 
shorthand  for  the  stateMnt  when  forced  to  answer  yes  or  no  to  this 
proposition  and  they  answered  yes.  It  may  Just  seem  like  a  way  out,  but 
I  think  that  it  does  mean  that  I  feel  that  the  numbers  we  are  talking 
about  are  valid  and  are  Justified  and  do  mean  something. 

DR.  ZADEH:  Suppose  that  a  patient  comes  to  you  and  suppose  that 
you  have  to  check  one  of  these  things,  the  patient  has  frequent  pain. 

On  what  basis  then  would  you  say  that  the  patient  has  frequent  pain? 
Suppose  he  tells  you  that  he  has  it  two  times  a  night  or  three  times  a 
night  or  five  times  a  night  and  so  forth,  at  which  point  would  you  say 
that  the  patient  has  frequent  pain? 

What  I  am  trying  to  say  is  you  cannot  sweep  this  under  the  carpet 

by  saying  that  you  will  treat  that  as  some  sort  of  a  crisp  proposition, 

because  you  will  have  to  indicate  which  class  that  falls  into.  Where 
will  be  that  threshold?  Is  it  specified  or  not?  This  is  my  question. 

DR.  SPIEGELHALTER:  First  of  all,  we  ought  to  make  every  effort 
to  make  a  proposition,  the  questions  as  crisply  defined  as  possible. 

How  many  times  do  you  wake  up  with  pain  and  those  are  often  collapsed 
back  down  into  categories.  But  that  is  not  really  to  get  rid  of  the 
fuzziness  solely.  It  is  in  order  to  Increase  and  discriminate  repair  of 
the  questions. 

That  I  see  is  the  vast  advantage  of  a  computer  interview  for  a 
clinician.  Exactly  the  same  question  Is  asked  to  everybody.  There  is 

no  Judgment  on  my  part.  I  never  fill  in  the  fora.  Doctors  don't  fill 

in  the  forms.  They  never  have  to  make  a  Judgment  about  whether  the  pain 
is  more  frequent  or  not.  It  is  purely  whether  the  patient  pressed  the 
button  jaying  yes  or  no  when  asked  this  question.  Now,  clearly 


different  patients  will  respond  differently. 

One  person  nay  wake  up  three  tiees  a  night  and  one  person  eay 
wake  up  once  a  week  and  they  both  press  the  button  frequent.  It  is 
quite  possible  that  people  have  coapletely  different  interpretations  of 
the  word.  But  that  variation,  that  respondent  variation  is  taken  into 
account  by  the  discrlninating  power  of  that  question  fron  the  data.  So 
provided  that  our  system  is  based  only  on  responses  to  a  strict  question 
and  that  is  the  way  the  system  is  built  and  that  is  the  way  it  is 
supposed  to  be  used,  then  I  don't  see  any  difficulty  with  the  fact  that 
different  people  may  Interpret  the  question  in  different  ways. 

If,  however,  there  would  be  a  big  difficulty  if  clinicians  Just 
said,  oh,  this  is  what  1  think  is  frequent  pain  and  put  that  into  the 
system  and  then  it  was  used  by  people  who  Interpreted  it  in  a  different 
way,  that  would  be  a  very  bad  thing  to  do.  But  provided  the  term  is 
used  with  the  same  amount  of  vagueness  by  everybody  then  it  can  be 
treated  as  a  crisp  term. 

DR.  DeGROOT:  Glenn,  did  you  have  a  comment  on  this  point? 

DR.  SHAFER:  I  would  Just  like  to  support  on  this  point  about 
starting  the  investigation  by  making  things  that  in  the  ordinary  course 
of  discussion  are  vague  as  crisp  as  possible.  It  seems  to  me  that  is  a 
general  aspect  of  scientific  investigation  and  there  is  no  particular 
need  to  apologize  for  it. 

DR.  REEVES:  One  of  the  criteria  suggested  was  user  satisfaction. 
Do  the  clinicians  understand  what  they  are  receiving  in  way  of  data  is 
in  fact  that  the  person  pushed  this  button,  not  that  the  person  has 
frequent  urination  or  whatever  it  may  be  and  do  they  accept  that  as 
having,  as  sort  of  an  educational  thing  as  was  mentioned  before?  They 
say  it  has  face  validity,  but  in  fact  they  are  good  Indicators. 

DR.  SPIEGELHALTER:  Yes,  at  the  moment,  because  the  people  using 
it  have  been  involved  peripherally  in  its  development.  I  think  it  is 
fairly  clear.  There  is  a  big  danger  though  that  it  could  start,  what 
the  machine  prints  out  could  start  being  used  as  truth  rather  than  Just 
what  someone  responded  when  asked  a  question  on  the  screen.  At  that 
point,  it  does  start  becoming  possibly  dangerous.  The  machine's 
colloquialism  is  based  on  a  very  well-defined,  precise  idea,  which  is 
how  the  patient  pressed  the  button  and  1  can  see  there  is  a  danger  of 
the  systems  in  the  future. 

DR.  SOYER:  On  this  problem  of  validation,  for  example,  normality 
of  errors,  I  think  it  is  again  a  problem  where  you  consider  scoring. 

For  example,  the  normality  of  errors.  You  never  observed  errors.  You 
only  have  estimates  of  errors  in  the  model,  but  you  will  eventually 
observe  whether  the  patient  is  going  to  have  a  certain  disease  or  not. 

So  when  you  are  validating  your  model  you  will  be  validating  based  on 
those  observable  values. 


I  think  then  the  scoring  cones  into  the  picture  again  here, 
because  1  think  for  any  scoring  rule,  for  strictly  scoring  rules,  it  is 
decomposable  into  two  parts  where  one  takes  care  of  the  calibration  and 
the  other  is  a  neasure  of  inforaatlon  and  I  think  this  can  be  nicely 
used  in  the  validation  part. 

DR.  SPIEGELHALTER:  Yes,  when  you  have  got  observable  outcomes, 
then  you  can  use  the  scoring  techniques  in  decompositions.  It  is  Just 
when  you  have  got  unobservable  outcomes,  it  does  not  seem  to  be  very 
clear  how  that  has  been  assessed. 

DR.  DeGROOT:  Let  me  exercise  the  moderator's  prerogative  to 
follow-up  on  that  with  a  comment  of  my  own.  It  seems  to  me  that  one  big 
advantage  of  using  probability  methods  is  that  it  is  possible,  as  you 
mentioned,  David,  to  do  an  evaluation  of  an  expert  system  and  to  compare 
different  expert  systems.  In  particular  one  can  think  about  the 
predicted  distribution  as  you  spoke  about,  the  predicted  distribution  of 
the  outcome  or  the  output  of  the  system  of  the  final  probability,  for 
example,  of  the  probability  distribution  of  the  various  diseases  and  it 
is  important.  Many  of  you  know  this,  but  some  may  not  that  when  you 
think  about  these  predicted  distributions  to  evaluate  a  system,  where  do 
I  think  I  will  end  up  after  I  have  collected  data  on  this  patient. 

We  know  that  we  would  like  our  final  answer,  to  use  terms  that 
Professor  Zadeh  and  others  have  been  using,  we  would  like  our  final 
probability  to  be  as  tight  as  possible.  That  is,  in  the  terms  I  was 
mentioning  yesterday,  if  we  think  about  the  final  overa^.!  probability  as 
the  weighted  average  of  various  conditional  probabilities,  we  would  like 
in  our  final  answer  all  of  those  conditional  probabilities  to  be  very 
similar,  then  we  would  feel  very  certain  and  reassured  that  our  answer 
is  stable  in  the  sense  that  a  little  bit  of  further  knowledge  would  not 
change  it  very  much. 

But  at  the  beginning  of  the  process,  before  we  have  collected  the 
data  and  we  think  about  what  is  our  final  probability  likely  to  be  and 
the  uncertainties  attached  to  what  that  final  probability  would  be,  we 
want  that  predicted  distribution  to  be  as  spread  out  as  possible. 

The  best  expert  systems  are  the  ones  that  have  as  broad  a  range 
as  possible  of  where  you  are  going  to  end  up.  I  mean  that  is  easy  to 
see  really,  if  you  think  about  it,  because  a  useless  expert  system  is 
one  where  you  know  where  you  would  end  up  before  you  began,  so  then  you 
don't  need  the  system,  if  it  is  not  going  to  change  your  prior 
probability,  for  example. 

So  the  good  expert  systems,  the  refined  expert  systems  are  ones 
where  you  are  very  uncertain  when  you  have  entered  the  process  where  you 
are  going  to  end  up.  So  the  most  spread  out  distributions  for  your 
posterior  probabiliteis  are  the  best  ones.  I  just  raise  as  a  question, 
and  perhaps  others  will  answer  it  later  in  the  discussion,  how  that 
concept  of  comparing  expert  systems  in  those  bases  could  be  used  outside 
of  the  probabilistic  methods. 


wT  ‘c’*  •v,”  •/ 


Another  way  the  probability  aethods  enter  Is  In  teras  of 
calibration.  David  aentloned  calibration.  Good  aysteaa  should  be 
calibrated  In  the  sense  that  he  aentloned.  But  calibration  as  was  Just 
coeaented  on.  calibration  Is  only  part  of  the  story  and  It  Is  possible 
to  compare  well-calibrated  systems  In  terms  of  what  I  call  refinement 
or  sufficiency  or  Inforaatlveness,  that  Is  again  you  want  a  system  that 
Is  not  Just  well-calibrated  but  one  that  gives  you  a  wide  range  of 
probabilities,  as  broad  a  range  as  possible. 

It  Is  very  easy  to  be  we 11 -calibrated  If  you  know  that  30  percent 
of  the  people  have  duodenal  ulcers,  then  all  you  have  to  do  whenever  a 
patient  comes  Is  say  the  probability  Is  30  percent  and  Ignore  all  of  the 
tests.  That  system  Is  we 11 -calibrated.  It  will  be  right  30  percent  of 
the  time  and  It  says  It  Is  going  to  be  right  30  percent  of  the  time,  but 
It  Is  useless.  It  does  not  take  an  expert  to  make  that  statement. 

So  what  you  want  la  a  very  sensitive  or  spread  out  or  refined 
distribution  of  probability.  So  I  Just  wanted  to  make  that  point.  In 
terms  of  probability.  It  Is  possible  to  make  the  evaluations  and 
comparisons  and  It  Is  a  slightly  interesting  point  If  you  have  not 
thought  about  It  before  that  you  really  want  highly  variable 
probabilities.  Those  are  the  best  systems. 


-  216  - 


GENERAL  DISCUSSION  I* 


DR.  DECROOT:  Okay,  let  us  resume.  The  title  of  this  session  is 
"General  Discussion." 

All  of  the  talks  are  open.  Are  you  going  to  say  something 
inflammatory  to  get  us  going,  Nozer? 

DR.  SINGPURWALLA:  I  will  say  a  few  things  as  a  member  of  the 
audience  who  is  used  to  the  calculus  of  probability,  and  further 
enlightened  by  Dennis  Lindley's  visit  and  companionship  and  at  GW. 

I  am  still  having  a  problem  in  understanding  possibility  theory, 
fuzzy  logic  theory,  and  also  belief  functions.  And  I  must  tell  you 
quite  honestly  that  I  have  made  several  attempts  to  read  the  papers. 

I  have  had  trouble  trying  to  get  at  the  root  of  the  matter  but 
that  may  be  due  to  my  own  weaknesses.  The  thing  that  is  coming  out  of 
today's  discussion,  applies  to  fuzzy  logic.  Please  correct  me  if  I  am 
wrong. 


It  appears  to  me  that  fuzzy  logic  and  possibility  theory  somehow 
do  not  apply  to  statements  of  uncertainty.  1  get  the  Impression  that  it 
does  apply  to  something  which  is  imprecise;  that  is,  something  which  is 
neither  yes  or  no,  but  something  which  is  maybe,  like  an  item  is  not 
failed,  but  partially  failed.  It  is  not  raining  or  dry  but  slightly 
raining.  Is  that  correct?  I  believe  I  gathered  the  impression  from  one 
of  Steve  Watson's  viewgraphs. 

I  think  he  said  that  fuzzy  logic  applies  to  precision  rather  than 
uncertainty  but  one  can  carry  the  argument  further  and  say  that 
imprecision  implies  uncertainty.  If  that  be  the  case,  then  my 
conclusion  is  that  the  calculus  of  probability  should  be  sufficient,  the 
scoring  rule  argument  supporting  its  basis. 

After  we  settle  this  issue,  I  think  I  would  like  to  ask  Professor 
Shafer  and  Professor  Dempster  to  make  one  or  two  convincing  arguments  as 
to  why  we  should  be  concerned  with  belief  functions  and  why  we  should 
use  them.  Somehow  their  message  has  not  come  out  clearly,  at  least  to 
me. 


( Laughter ) 


•Session  followed  presentations  of 
Drs.  Shafer,  Zadeh,  and  Lindley  and 
ensuing  discussions 


DR.  DeGROOT:  I  hope  we  don't  have  to  wait  until  after  we  have 
settled  this  issue,  to  use  your  phrase,  before  we  can  discuss  some  other 
topics. 

( Laughter ) 

I  think  it  is  too  much  to  hope  that  we  are  going  to  settle  issues 
at  this  conference.  I  think  we  are  going  to  expose  issues- at  this 
conference. 

DR.  SINGPURWALLA:  Who  knows?  We  may  even  settle  it. 

DR.  DeGROOT;  Please,  not  today,  or  we  won't  have  anything  to  do 
tomorrow  if  we  settle  it  this  afternoon. 

(Laughter) 

Did  you  want  to  comment? 

DR.  YAGER:  I  want  to  comment.  I  guess  I  like  to  use  the  term 
uncertainty  as  sort  of  a  general  term  and  sort  of  what  you  call 
probability.  I  call  that  sort  of  randomness. 

DR.  SINGPURWALLA:  No.  By  uncertainty,  I  mean  something  very 
precise.  What  is  the  temperature  outside?  I  don't  know.  I  will  be 
able  to  measure  it  later  on.  So  uncertainty  is  something  which 
typically  reveals  itself,  unless  I  am  dealing  with  parameters.  Most  of 
the  situations  of  uncertainty  eventually  reveal  themselves.  So 
uncertainty  is  very  clear  to  me. 

DR.  YAGER:  First  of  all,  in  classic  probability  theory,  much  of 
the  information  is  sort  of  imprecise  information.  The  fact  of  the 
matter  is  that  when  you  have  probabilities,  for  example,  you  have  them 
imprecisely. 

For  example,  the  quantifiers  that  Professor  Zadeh  talked  about. 
Most  students  do  this  and  so  forth.  That  is  really  a  probability.  So 
much  of  the  probabilistic  information  itself  is  fuzzy  information  or 
imprecise  information. 

So  in  dealing  with  and  manipulating  probability  in  the  ways  that 
Professor  Lindley  talked  about  you  have  to  really  manipulate  fuzzy 
numbers  and  ruzzy  information. 

DR.  SINGPURWALLA:  I  am  not  sure  I  understand  you.  To  me, 
probability  is  an  expression,  a  numerical  expression  of  the  way  I  assess 
an  uncertain  situation.  By  definition  it  cannot  be  precise.  It  is  the 
way,  one  expresses  uncertainty  about  something.  It  cannot  be  precise, 
because  it  is  personal. 


DR.  YAGER:  If  I  ask  you,  for  example,  what  Is  the  probability  it 
is  going  to  rain  tomorrow,  what  would  you  say? 

DR.  SINGPURWALLA:  Oh,  I  will  think  about  it.  I  will  have 
background  information  and  based  on  that  I  will  say  the  probability  that 
it  is  going  to  rain  tomorrow  is  some  number  P. 

DR.  YAGER:  What  is  the  number  you  say? 

DR.  SINGPURWALLA:  Well,  1  will  say  .6  and  I  am  willing  to  bet 
with  you  eighty  cents  to  the  dollar.  Eventually  somebody  will  score  me 
on  such  bets. 

DR.  YAGER:  What  I  am  willing  to  say  is  something  like,  for 
example,  I  am  willing  to  allow  you  to  say  .8  but  I  am  also  willing  to 
allow  you  to  say  the  probability  is  close  to  .8,  or  near  .8,  or  around 

.8. 

DR.  SINGPURWALLA:  I  may  round  it  off  to  .8763  if  that  is  what 
you  are  after.  Because  to  me  the  number  doesn't  mean  anything  absolute 
in  a  certain  sense.  It  is  not  some  physical  quantity  that  I  am  after. 
Therefore  I  fail  to  see  the  notion  of  fuzziness  and  I  am  trying  to  keep 
an  open  mind. 

DR.  DeGROOT:  Let  me  rephrase  the  question,  Nozer.  You  say  that 
probability  is  a  numerical  measure  of  your  uncertainty,  but  why  isn't  it 
legitimate  to  ask  the  question  "what  is  your  uncertainty  about  that 
numerical  measure?" 

DR.  SINGPURWALLA:  Your  question  is  legitimate.  I  could  of 
course  add  a  probability  distribution  to  that  original  probability. 

That  is,  I  could  add  a  hierarchy  to  it,  and  do  everything  using  the 
calculus  of  probability.  The  concept  is  very  easy,  even  though  its 
implementation  and  application  might  be  difficult.  I  do  understand  what 
I  would  be  doing.  It  seems  to  have  a  logical  and  fundamental  basis. 

I  am  at  a  loss  as  to  why  I  need  these  other  notions,  unless  these 
other  notions  can  make  a  convincing  case. 

DR.  ZADEH:  This  question  was  raised  earlier  when  somebody  asked 
Dennis  what  is  uncertainty.  And  there  is  of  course  an  obvious  answer  to 
this  question.  Some  people,  and  I  think  Stephen  took  that  position, 
would  differentiate  between  uncertainty  and  imprecision. 

There  are  other  people  who  would  say  imprecision  is  simply  a  kind 
of  uncertainty  so  there  is  sort  of  a  hierarchical  relationship  between 
them.  I  tend  to  take  the  latter  point  of  view. 

So  certainty  is  something  very  general  and  there  are  different 
kinds  of  uncertainty.  But  when  you  talk  about  Imprecision  or  fuzziness, 
you  are  talking  about  the  lack  of  sharp  boundaries. 


It  is  sort  of  a  situation  where  membership  in  a  class  is  a  matter 
of  degree.  So  when  you  are  talking  about  somebody  being  young  or 
whether  it  will  rain  tomorrow  or  not,  these  are  matters  of  degree,  even 
more  so  in  the  case  of  whether  it  will  be  warm  tomorrow  or  not. 

So  in  probability  theory,  the  concept  of  an  event  is  a  crisp 
concept.  In  other  words,  either  something  is  in  the  event  or  it  is  not. 
No  allowance  is  made  for  situations  in  which  an  event  can  take  place  to 
a  degree. 

One  of  the  things  that  fuzzy  logic  does  is  that  it  makes  it 
possible  to  enrich  your  language  by  allowing  you  to  deal  with  fuzzy 
events.  And  furthermore — and  this  is  also  an  important 
characteristic — it  makes  it  possible  for  you  to  describe  the 
probabilities  that  are  associated  with  fuzzy  or  crisp  events  in 
imprecise  terms. 

By  so  doing  then,  it  gives  you  more  tools  to  work  with.  It  gives 
you  more  a  expressive  language.  So  essentially  the  disagreement  is 
this.  There  are  some  people  who  say  that  the  language  of  probability 
theory  is  sufficient.  It  is  adequate.  Professor  Lindley  is  a  foremost 
exponent  of  that  point  of  view. 

There  are  other  people  who  say  no,  it  is  not  adequate.  It  is  not 
a  matter  of  saying  that  probability  is  wrong  or  right.  It  is  a  matter 
of  adequacy.  So  the  latter  position  then  is  that  it  cannot  cope  with 
problems  in  which  the  events  are  fuzzy  events. 

It  cannot  cope  with  situations  in  which  your  characterization  of 
probabilities  is  imprecise.  It  cannot  cope  with  those  problems.  This 
is  really  the  position  that  is  taken. 

So  long  as  you  stick  to  problems  in  which  the  events  are  crisply 
defined,  probabilities  are  crisply  defined,  then  you  stay  within  your 
probability  theory.  There  is  no  problem. 

DR.  LINDLEY:  Well,  I  want  an  example.  You  talk  about  rain 
tomorrow.  That  is  perfectly  crisply  defined  in  terms  of  the  probability 
distribution  of  the  millimeters  of  rain  that  will  fall.  Any  fuzzy 
statement  that  I  have  heard  you  make  can  be  stated  in  probabilistic 
terms. 


DR.  ZADEH:  What  about  a  warm  day,  because  warm  is  a  better 
example  than  rain, 

DR,  LINDLEY:  Well,  it  is  the  probability  of  the  temperature 
tom.orrow. 

DR.  SOYER:  And  ypu  can  always  define  warm.  I  can  ask  you  what 
you  mean  by  warm.  So  you  can  give  me  a  temperature.  Then  I  can  always 
redefine  the  event,  then  use  probability  theory. 


220  - 


DR.  YAGER:  But  how  do  you  define  warm? 

DR.  SOYER:  I  can  ask  you  what  do  you  mean  by  warm. 

DR.  YAGER:  What  do  you  mean  by  warm? 

DR.  SOYER:  But  then  I  will  ask  your  subjective  opinion  about  it. 
For  example,  my  notion  of  warm  might  be  different  from  yours.  I  might 
say  that  warm  is  from  10  C  to  15  C. 

DR.  KAGER:  9.98  is  not  warm? 

DR.  SOYER:  It  depends  on  your  subjective  opinion  about  what  is 

warm. 

DR.  ZADEH:  That  is  precisely  the  point.  That  is  what  fuzzy 
logic  tends  to  get  away  from,  the  imposition  of  those  artificial 
thresholds.  It  is  not  that  9.999  is  cold  and  10  is  warm.  That  is  the 
point.  There  is  no  sharp  break.  It  is  a  matter  of  graduality. 

So  there  is  a  degree  to  which  it  will  be  a  warm  day.  So  the 
truth  value  is  a  value  between  zero  and  one. 

That  is  where  this  example  that  Professor  Lindley  was  talking 
about  would  not  work,  because  you  can  no  lon£,er  say  that  it  is  true  or 
false.  If  you  want  to,  you  can  use  multivalued  logic  for  the  assessment 
of  the  truth  value  of  the  statement  that  the  event  has  taken  place. 

That  is  the  point. 

DR.  DeGROOT:  Could  you  say  a  little  bit  about  how  the  concept  of 
learning  would  enter  into  your  theory  of  fuzzy  logic? 

DR.  ZADEH:  Learning,  of  course,  is  a  very  complex  concept  in 
itself,  and  there  is  no  universally  agreed  upon  definition  of  what 
constitutes  learning.  But  one  simplified  perception  of  what  constitutes 
learning,  which  has  been  implemented  to  some  extent,  is  one  done  by 
Professor  Sugeno  in  Japan.  He  has  done  very  interesting  work  where  you 
have  a  rectangular  track  and  you  have  a  model  car  that  can  be  steered  by 
a  human. 

You  steer  it  and  it  does  certain  things.  Then  a  system  takes 
over.  The  system  which  takes  over  has  learned  the  algorithm  that  the 
person  uses  in  maneuvering  this  car  through  this  track. 

The  system  does  it  automatically.  No  matter  where  you  put  the 
car,  the  system  does  it.  So  the  system  has  learned  how  the  human 
operator  does  that  sort  of  a  thing.  This  is  something  that  has  been 
done  already. 

The  system,  for  example,  may  learn  how  to  park  a  car.  That  is  a 
little  more  complicated,  the  same  sort  of  a  thing.  But  the  rules 
learned  in  that  case  are  all  fuzzy  rules. 


It  would  be  impossible  to  do  that  sort  of  a  thing  using  crisply 
defined  rules.  It's  too  complicated.  This  is  part  of  what  is  called 
fuzzy  logic  control.  Eventually  the  control  is  nonfuzzy  but  on  the 
level  of  description  of  the  rule,  on  that  level  it  is  fuzzy. 

So  you  take  a  complicated  situation  and  you  try  to  describe  the 
relationship  between  variables  in  fuzzy  terms.  That  then  is  translated 
into  fuzzy  logic  rules.  Once  it  is  .translated  then,  it  is  implemented 
deterministically.  That  is  the  way  it  is  done. 

DR.  SHAFER:  I  think  this  level  o^  description  idea,  is  a  good 
point.  But  why  couldn't  that  higher  level  of  description  be  worked  out 
in  terms  of  probability? 

DR.  ZADEH:  Too  complicated.  Let  me  give  you  an  example  that 
doesn't  involve  probabilities.  It  involves  the  description  of  a  curve, 
for  example,  I  want  you  to  describe  that  curve. 

I  want  you  to  describe  that  curve.  If  you  look  at  a  curve, 
qualitatively  you  can  say  when  X  is  small,  Y  is  large.  When  this  is 
this,  that  is  that.  And  in  those  fuzzy  terms  you  can  describe  roughly 
this  curve.  There  is  no  probability  Involved  in  that  sort  of  thing. 

Now  you  could  describe  this  curve  point  by  point  but  it  is  too 
complicated.  You  can  capture  the  qualitative  behavior  of  that  curve  by 
giving  these  fuzzy  pairs:  if  X  is  this,  then  Y  is  that;  if  X  is  that, 
then  something-something,  and  so  forth  which  may  be  good  enough  for  your 
purposes. 

In  other  other  words,  that  definition  or  characterization  of  a 
relationship  between  X  and  Y  might  be  sufficient  for  purposes  of 
control.  But  it  is  the  fact  that  we  are  using  imprecise 
characterizations  that  makes  it  possible,  that  makes  it  feasible  to  use 
a  relatively  small  number  of  rules. 

You  see,  if  I  ask  you  to  define  how  you  park  your  car,  you  will 
give  me  just  a  few  rules.  Those  rules  will  be  fuzzy  rules.  If  I  ask 
you  to  deflre  it  precisely,  it  will  be  an  impossible  problem.  Too  many 
rules  woulc  oe  required  for  that  purpose. 

DR.  SINGPURWALLA:  Is  it  true  that  when  one  subscribes  to  fuzzy 
logic  one  essentially  abides  with  the  calculus  of  probability,  but  finds 
it  very  difficult  to  work  with,  and  therefore  as  an  approximation  one 
makes  compromises  and  moves  somewhere  else? 

DR.  ZADEH:  What  you  do  is  this:  you  certainly  accept 
probability  theory.  You  don't  challenge  probability  theory  in  any 
respect.  You  merely  say  that  the  language  of  probability  theory  is  too 
restrictive  to  deal  with  the  imprecision  that  one  finds  in  the  real 
world.' 


-N 


222  - 


w 


So  when  Professor  Lindley  says  that  the  probability  that 
Shakespeare  did  not  write  Hamlet  is  .2,  that  .2  has  a  certain  ring  of 
precision  to  it  that  is  not  really  Justified.  In  other  words,  nobody 
can  say  that  it  is  .2  or  .3  or  something. 

In  Rasmussen's  report  you  arrive  at  the  conclusion  that  the 
probability  of  a  nuclear  accident  or  something-something  is  something  of 
10  to  the  minus  something.  There  is  absolutely  no  way  of  verifying  or 
proving  or  disproving,  whatever. 

In  other  words,  it  says  that  all  of  these  statements  are 
unrealistically  and  unjustifiably  precise;  that  the  most  that  you  could 
say  based  on  the  information  that  you  have  is  something  like  the 
probability  that  Shakespeare  did  not  write  Hamlet  is  quite  low.  That  is 
the  most  that  we  could  say  really. 

Any  other  numerical  value  is  misleading.  It  conveys  the 
impression  of  much  greater  degree  of  knowledge  and  understanding  than 
you  really  have. 

DR.  LINDLEY:  But  you  use  numbers.  In  that  example  you  used  .3. 
Well,  why  is  your  .3  different  from  my  .3? 

DR.  ZADEH:  These  numbers  are  on  a  different  level.  Probability 
uses  a  number  but  nevertheless  it  gives  you  a  less  precise 
characterization  of  a  deterministic  situation. 

DR.  LINDLEY;  But  you  say  my  .3  could  be  .U.  Why  can't  your  .3 

be  .14? 

DR.  ZADEH:  No,  I  wouldn't  say  .2.  I  would  say  low. 

DR.  LINDLEY:  But  you  don't.  You  put  numbers. 

DR.  ZADEH:  In  the  definition  of  low. 

DR.  LINDLEY;  That  paper  of  yours  has  duodenal  ulcer  of  .3.  Now 
what  I  am  saying  to  you  is  that  you  are  committing  yourself  to  numbers. 
If  you  are  committing  yourself  to  numbers,  why  is  your  commitment  to 
those  sorts  of  numbers  better  than  my  commitment  to  another  sort  of 
number? 

They  are  both  fuzzy.  They  are  both  crisp.  I  don't  see  the 
difference.  The  point  is  that  your  numbers  combine  in  a  peculiar  way 
and  mine  combine  in  another  peculiar  way,  but  at  least  I  can  Justify  my 
peculiarity. 

( Laughter) 

% 

DR.  YAGER:  In  a  certain  sense,  the  difference  is  in  the  quality. 
When  you  say  probability  of  .3,  you  are  committed  to  one  number,  okay? 
But  when  you  talk  about  a  fuzzy  set  and  you  assign  numbered  values  to 
it,  you  give  a  whole  bunch  of  numbers  which  in  a  sense  sort  of  nullify 
each  other.  Each  number  in  itself  is  not  as  significant  as  the  whole 


bunch.  So  you  could  be  off  on  one  number  and  this  doesn't  have  to  be  as 
precise  as  that  one  number  that  you  give. 


When  you  give  .3i  that  is  a  very  precise  one  piece  of 
information.  When  you  give  a  fuzzy  set,  you  are  giving  a  whole  bunch  of 
numbers.  It  is  the  total  of  the  numbers  that  count  and  so  if  an 
individual  number  is  off  it  really  doesn't  matter. 

It  is  sort  of  the  same  thing  as  when  in  probability  when  you  have 
a  whole  bunch  of  readings  and  you  give  the  mean,  you  lose  a  lot  of 
information  than  you  would  if  you  had  all  the  values. 

DR.  LINDLEY:  When  you  give  a  membership  function  for  fuzzy  sets, 
it  is  a  probability  distribution,  which  is  a  whole  set  of  numbers. 
Exactly  the  parallel  is  between  a  fuzzy  membership  function  and  a 
probability  function.  They  are  both  a  lot  of  numbers. 

DR.  WATSON:  I  don't  think  that's  right,  Dennis. 

( Laughter) 

It  seems  to  me  that  the  situation  you  have  to  compare  is  your 
saying  .2  and  Lotfi's  saying  very  small  or  quite  small.  He  might 
a’-ticulate  that  quite  small  by  a  fuzzy  set  which  will  be  a  set  of  whole 
I. umbers.  And  it  is  his  set  of  numbers  which  is  being  compared  with  your 
one  number.  What  Ron  is  saying  is  that  you  can  afford  to  be  out  quite  a 
bit  on  one  or  more  of  this  whole  set  of  numbers.  You  can  afford  for 
your  f'unction  to  be  playing  around  a  bit  and  you  hopefully  won't  have 
much  in  the  way  of  a  different  conclusion. 

But  if  your  .2  were  out  and  was  .25  instead,  it  may  affect  the 
answer.  But  that  of  course  is  the  crucial  test  for  the  difference 
between  the  two.  What  is  the  implication  of  the  different  theories? 

It  seems  to  me  that  what  we  need  to  do,  particularly  in  fuzzy  set 
theory,  is  to  test  what  the  output  implications — how  sensitive  they 
are--are  to  these  input  membership  functions. 

I  suspect  that  they  should  be  quite  sensitive  and  this  does  worry 
me.  But  I  don't  think  you  are  right  in  the  two  things  you  are 
comparing. 

DR.  SHAFER:  In  the  case  of  the  fuzzy  control  story,  if  I  got  the 
story  right,  the  numbers  are  actually  pretty  precise  and  they  are  gained 
from  the  experience  of  the  calibrating  trials.  Is  that  right? 

DR.  ZADEH:  No,  that's  the  point.  You  use  fuzzy  control  in 
situations  in  which  you  have  a  great  deal  of  robustness.  So  coarse 
control  is  adequate.  Yoii  wouldn't  use  that  kind  of  control  in  a 
situation  in  which  a  high  level  of  precision  is  expected. 


.V-V. 


■ if. 


•.-I'Sl--'.-. 


-r -.1. -.k. 


224  - 


So  you  essentially  take  advantage  of  the  tolerance  for 
imprecision.  Let  me  give  you  a  quick  example  of  that  sort  of  thing. 
Suppose  you  want  to  park  your  car.  Now  when  you  go  to  park  your  car, 
the  final  position  of  the  ear  is  not  specified  very  precisely. 

We  want  it  within  a  few  inches  of  the  curb  and  the  angle  could  be 
something-something.  And  humans  can  take  advantage  of  this  imprecision. 
So  it  takes  very  little  time  to  park  your  car. 

But  if  I  specified  the  final  position  precisely,  If  I  said  the 
car  should  be  within  one-hundredth  inch  of  something  and  it  should  be 
somewhere  plus/minus  three  seconds  of  an  arc  of  something-something,  it 
would  take  you  five  years  to  park  your  car. 

That  is  the  point.  So  there  are  many,  many  situations  in  which 
there  is  this  tolerance  for  imprecision.  In  whatever  you  do,  you  take 
advantage  of  that.  If  I  want  to  go  through  this  door,  it  is  not  that 
important  as  to  whether  I  pass  five  inches  on  this  side  or  five  inches 
of  that  side,  and  my  actions  are  governed  and  Influenced  by  this  lack  of 
need  for  precision. 

So  this  is  essentially  what  you  try  to  take  advantage  of  when  you 
use  fuzzy  logic  control.  It’s  the  tolerance  for  imprecision. 

DR.  LINDLEY;  But  that  is  exactly  captured  by  utility  functions 
in  the  distance  from  the  curb. 

DR.  DeGROOT:  My  understanding  from  what  you  are  saying  is  I 
think  we  all  agree.  There  are  many  situations  where  knowing  that 
usually  the  situation  is  thus  and  so  is  enough  to  allow  us  to  act.  And 
no  probabillst  or  Bayesians--or  whatever  we  are  being  called  these 
days--would  disagree  with  that.  I  recognize,  even  though  I  am  geared  up 
to  do  a  Bayesian  analysis  in  a  given  problem,  that  I  may  not  have  to 
specify  my  probability  distribution  down  to  the  probability  of  the  last 
possible  event,  because  as  Dennis  says,  1  know  the  utility  function  and 
I  know  the  specific  problem  at  hand  would  only  require  a  few  crude 
measures  of  probability. 

And  in  those  cases  I  am  sure  we  would  both  do  the  same  things. 

You  say  usually  it  is  thus  and  so  and  so  you  are  going  to  do  it  a 
certain  way,  and  I  say  well  the  probability  is  pretty  large  on  this  and 
I  don’t  have  to  do  it. 

I  am  certainly  not  going  to  waste  my  time  doing  calculations,  as 
was  suggested  earlier,  on  a  thousand  dimensional  parameter  space  when 
all  1  need  is  a  very  crude  probability  of  a  certain  event  to  determine 
my  action. 

So  I  think  that  w{)at  you  are  saying  is  often  inter pretable  in 
terms  of  probability.  You  are  saying  there  is  no  need  to  gather  very 
precise  information  In  many  circumstances,  and  I  agree. 


225  - 


But  what  do  you  do  when  you  do  need  more  precision  to  choose  a 
reasonable  action? 

DR.  ZADEH:  Use  the  classical  probability  theory. 

DR.  DeGROOT:  Good.  I  didn't  like  the  word  "classical"  but  - 
(Laughter) 

DR.  SINGPURWALLA:  You  mentioned  the  Rasmussen  report  and  you 
mentioned  the  probability  of  accident  or  whatever  it  is  that  they  were 
looking  into  with  a  certain  number. 

I  think  for  the  record  I  should  also  say  that  that  particular 
number  in  the  Rasmussen  report  had  an  uncertainty  statement  connected 
with  it.  It  was  done  using  a  fault  tree  where  they  had  uncertainties 
for  all  basic  events  and  it  was  the  propagation  of  those  uncertainties 
using  the  regular  calculus  of  probability  that  was  used  to  arrive  at  the 
top  number  plus  an  Interval  around  it.  So  uncertainties  can  and  have 
been  assigned  to  probabilities. 

DR.  ZADEH:  I  think  you  will  agree  with  me  that  whatever 
intervals  associated  with  it  were  excessively  precise  in  the  relation  to 
our  understanding  of  the  whole  thing.  It  could  be  way  off. 

DR.  SINGPURWALLA:  Oh,  I  agree  with  you  that  there  may  have  been 
strains  of  optimism  there,  and  I  agree  that  there  may  be  strains  of 
handwaving  and  all  kinds  of  other  things  that  went  in  the  Rasmussen 
Report.  But  the  point  is  that  it  used  the  calculus  of  probability  to 
assess  uncertainty. 

What  is  at  issue  here  is  the  implementation  rather  than  the 
philosophy  of  the  logic  which  went  into  coming  up  with  these  numbers.  I 
think  what  we  are  discussing  is  the  means  rather,  than  how  it  was  done. 

I  agree  with  you  that  it  may  have  been  done  in  a  sloppy  way. 

DR.  DeGROOT:  Well,  to  follow  up  on  that  comment  and  come  back  to 
the  point  that  was  made  before,  I  think  that  if  Dennis  tells  us  that  his 
probability  is  .2  that  Shakespeare  wrote  Hamlet,  and  we  say  that  that  is 
a  totally  precise  statement,  no  denying  that,  I  think  we  are  entitled  to 
ask  him,  as  some  of  us  were  asking  before,  how  did  he  arrive  at  that  .2. 

And  I  think  we  are  entitled  to  know  that  if  he  is  thinking  of  ten 

different  possible  contingencies,  what  probabilities  would  he  assign  to 
Shakespeare  having  written  Hamlet  under  each  of  those  ten  possible 
contingencies.  I  think  we  are  then  entitled  to  know  those  ten 
conditional  probabilities,  as  well  as  the  ten  probabilities  that  he  is 
using  as  weights  over  which  he  is  averaging  them. 

His  .2  that  he  gives  us  as  his  final  overall  marginal  probability 

is  a  weighted  average  of  many  probabilities.  And  so  we  want  to  know 
what  those  many  probabilities  are  and  what  the  weights  are,  and  I  think 
we  are  entitled  to  ask  him  for  those.  And  we  are  entitled  to  disagree 
with  whatever  aspects  of  those  2N  numbers  are  that  he  has  to  tell  us. 


226  - 


That  will  permit  us  to  think  about  how  we  want  to  think  about  the 
probability  that  Shakespeare  wrote  Hamlet.  Do  we  want  to  simply  accept 
his  .2?  Do  we  want  to  raise  it  or  lower  it  because  we  disagree  with 
some  of  the  components  that  went  into  it? 

I  don't  think  to  do  a  Bayesian  analysis  means  that  you  only 
report  a  single  number  at  the  end  and  everyone  operates  from  that.  Not 
at  all.  Indeed  it  is  your  responsibility  as  a  scientist  to  report  all 
the  probabilities  that  went  into  this  final  overall  expectation. 

Every  probability  is  an  expectation  or  a  weighted  average.  So  I 
think  that  should  be  part  of  it.  That  doesn't  contradict  the  Bayesian 
approach  to  say  that  there  are  many  numbers  which  enter  into  it. 

Yes?  You  have  been  waiting  very  patiently. 

DR.  YAGER:  Stephen  brought  up  the  whole  idea  of  how  you  get 
these  membership  grades  and  things  like  this.  It  seems  to  me  that  the 
diagram  that  Glenn  drew  in  his  talk  was  very  interesting  about  where  all 
this  stuff  fits  into  expert  systems  in  that  we  have  a  sort  of  natural 
language,  and  we  convert  that  to  some  sort  of  mathematics.  I  think  that 
is  what  we  are  really  all  doing  here.  What  we're  doing  is  manipulating 
the  mathematics  and  then  coming  out  with  some  sort  of  linguistic 
information  at  the  end,  and  then  it  goes  to  some  sort  of  user. 

It  Just  dawned  on  me  it  would  be  interesting  to  look  at  the  fact 
that  if  you  give  a  person  who  has  to  make  some  decisions — let's  say 
somebody  in  the  Navy,  for  example — if  you  give  them  information  that 
says  that  the  probability  of  the  enemy  doing  this  is  .8  or  if  you  give 
him  the  information  that  the  probability  of  the  enemy  doing  this  is 
high,  I  wonder  if  he  may  be  more  able  to  deal  with  the  fact  that  it  is 
high  than  the  fact  that  it  is  .8. 

Somehow  .8  is  a  very,  very  sort  of  lonely  number  standing  out 
there  in  the  middle  of  nowhere.  Somehow  I  have  the  feeling  that  high  or 
some  linguistic  information  sort  of  tunes  in  much  better  to  his  own 
decision-making  system.  We  have  to  remember  to  provide  this  information 
for  users. 

DR.  FISHBECK:  That  depends  on  how  he  is  trained.  I  submit  that 
.8  can  be  very  meaningful  to  somebody  rather  than  high,  slurring  the 
Navy  like  that  I  guess. 

DR.  YAGER:  No,  no.  I  am  saying  any  human  being. 

DR.  GROSS:  I'm  Don  Gross  at  George  Washington  University.  I 
wonder,  when  Dennis  came  out  with  the  statement  that  the  probability 
that  Shakespeare  wrote  Hamlet  was  .2,  there  was  sort  of  an  "uh." 

Would  it  have  made  any  difference  to  us  if  he  had  said  the 
probability  that  Shakespeare  wrote  Hamlet  was  .25  or  .15?  I  think  the 
impact  of  the  statement  was  that  it  was  a  low  probability. 


227  - 


It  seems  almost  that  it  depends  on  the  application  —  I  don't 
want  to  say  Implementation — of  the  situation  of  which  the  statement  is 
made.  Would  it  have  made  any  difference  to  us  if  he  would  have  said  .25 
instead  of  .2? 

( Pause) 

DR.  DeGROOT:  Is  there  anyone  here  to  whom  it  would  have  made  a 
difference? 

(Laughter) 

Except  Dennis. 

(Laughter) 

DR.  SHAFER:  I  think  the  swine  flu  story  is  a  good  story  which 
has  been  cited.  It  is  very  justifiably  a  story  where  numbers  would  have 
helped  a  lot  because  you  had  a  communication  problem  where  stories — it 
turns  out  that  words  do  convey  pictures  to  people  and  sometimes  not  the 
pictures  that  were  meant  to  be  conveyed,  and  it  goes  from  one  person  to 
another  and  it  gets  changed  more  easily  if  it  is  words  than  if  it  is 
numbers. 

DR.  DeGROOT;  Mr.  Wise? 

DR.  WISE:  I  asked  this  question  originally  when  Professor 
Lindley  just  started  talking  and  I  can't  resist  bringing  in  a  little 
point  from  physics.  That  is,  in  quantum  mechanics  it  is  considered  a 
big  discovery  that  electrons  do  not  act  like  little  painted  balls  and 
urns. 

And  you  can't  characterize  them  with  single  numbers.  At  the 
minimum,  you  need  complex  numbers.  And  that  is  a  formalism  they 
developed  to  try  to  explain  their  experiments.  They  did  an  experiment 
to  prove  that. 

And  if  we  are  worrying  about  philosophical  foundations,  we  have 
yet  to  demonstrate  that  one  number  is  sufficient.  Maybe  it  is  a  triple. 
Maybe  it  is  a  triple  of  complex  numbers.  In  some  case,  one  number  won't 
do. 

DR.  DeGROOT:  Dennis? 

DR.  LINDLEY:  The  situation  there  as  I  understand  it  is  as 
follows.  Suppose  that  you  did  give  a  pair  of  numbers  instead  of  one 
number  and  suppose  that  you  scored  that  pair  of  numbers. 

Then  it  follows  that  you  are  redundant  and  really  you  need  only 
have  given  one:  the  probability. 


But  suppose  you  give  two  numbers  and  you  use  two  score  functions 
at  the  same  time,  okay?  What  these  two  score  functions  are  trying  to  do 
is  to  express  somehow  different  qualities  of  the  system. 

If  you  used  two  score  functions — I  must  say  I  haven't  followed 
through  the  mathematics — but  it  looks  pretty  clear,  of  course,  that  you 
would,  in  fact,  finish  up  with  two  numbers.  What  their  rules  would  be, 

I  don't  know. 

But  a  single  measure  of  worth  leads  to  probability.  Two  measures 
of  worth  will  lead  to  something  more  complex. 

DR.  WISE:  No.  I  would  argue  otherwise  because  in  quantum 
mechanics  you  have  a  complex  number  and  the  magnitude  of  the  number  is 
in  fact  probability.  And  so  your  argument  applies  very  neatly  to  the 
magnitude . 

But  in  order  to  work  with  them  and  add  them,  you  have  to  have  the 
interference  reinforcement  effects  you  get  with  complex  numbers.  You 
can't  get  the  correct  probabilities  at  the  end  unless  you  do  all  the 
calculations  in  the  middle  with  complex  numbers  even  though  you  are 
scoring  probabilities  at  the  end. 

DR.  LINDLEY:  Yes.  I  am  afraid  I  can't  argue  about  the 
technical — 


DR.  WISE:  My  question  is  why  do  you  think  that  is  a  unique 
situation?  Why  do  you  think  in  all  other  cases  a  single  number  is 
sufficient? 

DR.  LINDLEY:  Well,  I  don't  think  a  single  number  is  necessarily 
sufficient.  What  I  am  saying  is  that  if  you  are  prepared  to  do  it  by 
means  of  a  single  number,  and  as  I  see  it,  despite  the  statements  which 
have  been  made,  belief  functions  and  fuzzy  people  do  use  a  single 
number.  That  if  you  use  the  single  numbers,  those  single  numbers  must 
be  probabilities. 

Now  if  you  are  going  to  use  pairs  of  numbers  or  m>ore  complicated 
things  and  you  only  use  one  score  function,  then  I  am  sure  it  comes  back 
to  probabilities.  If  you  use  two  score  functions,  then  I  am  not  clear 
what  happens  but  yes,  you  may  need  two  numbers. 

DR.  SOLAND:  I  can  inject  something  about  expert  systems  which  I 
don't  know  very  much  about.  1  do  know  from  the  paper  of  Professor 
Zadeh's  that  I  read  that  there  appear  to  be  some  applications  of  fuzzy- 
set  theory  which  sound  very  much  to  me  like  expert  systems  in  terms  of 
control . 

So  there  is  some  proof  in  the  pudding  that  it  has  worked,  at 
least  for  expert  systems.  Do  we  find  the  same  thing  with  a  complete 
Bayesian  probability  analysis  and  belief  function  analysis?  Can  we 
point  to  some  working  expert  systems  or  prototype  expert  systems  based 
upon  these?  And  what  has  to  be  done  to  make  them  better  or  to  get  them 
if  we  don't  have  them  yet?  Professor  Shafer  didn't  give  enough  detail 


229  - 


rw/¥  I '.  r.’v.  ^  V.  ‘Vn 


for  me  to  follow  about  what  the  difficulties  were  in  using  them, 
probabilities  for  example,  in  some  of  the  expert  systems  that  he  talked 
about.  But  if  there  are  operational  difficulties,  then  I  think  we  ought 
to  discuss  those  because  maybe  one  of  the  big  benefits  of  the  fuzzy  set 
approach  is  that  it  avoids  certain  operational  difficulties. 

DR.  DeGROOT:  That's  a  good  point.  The  subject  of  the  conference 
is,  at  least  in  part,  expert  systems  and  that  would  be  interesting  to 
hear  some  more  about  the  operational  relationship  of  these. 

DR.  ZADEH:  Of  course,  many  of  the  people  who  work  in  fuzzy 
logic,  myself  included,  were  brought  up  on  probability  theory.  Most  of 
my  best  friends  are  probabilists. 

(Laughter) 

So  it  is  not  a  strange  subject.  I  include  Professor  Lindley  in 
that  class.  So  it  is  not  something  that  you  are  not  aware  of.  The 
point  is  that  contrary  to  what  is  accepted  in  the  case  of  the  classical 
probability  theory,  you  do  make  a  differentiation  between  something  that 
is  imprecise  in  the  sense  of  being  posslbilistic,  and  something  that  is 
probabilistic. 

In  that  case,  "high"  interpreted  as  a  possibility  distribution  is 
a  generalization  of  an  interval.  An  interval  is  not  a  probability 
distribution.  By  "high"  you  mean  more  than  so  many  degrees,  or  so  many 
inches,  or  whatever. 

When  you  take  high  to  mean  something  like  possibility 
distribution  defined  by  one  of  these  curves,  that  is  an  extension  of  an 
interval.  It  is  not  an  extension  of  constant  probability. 

It  is  precisely  because  of  the  lack  of  differentiation  between 
the  two--possibility  and  probability — that  we  find  ourselves  in 
situations  where,  contrary  to  what  Professor  Lindley  said,  there  are 
many  problems  that  cannot  be  handled  within  the  framework  of  probability 
theory. 


The  rules  of  combination  are  quite  different.  You  don't  combine 
possibilities  the  way  you  combine  probabilities.  And  there  are  many, 
many  examples  of  situations  in  which  if  you  interpret  these  things  as 
probabilities  you  get  completely  wrong  results  or  else  horribly 
complicated  results.  It  is  one  or  the  other. 

So  now  the  point  that  was  made  here  by  David  is  that  it  is  not 
the  matter  of  acceptability.  Of  course  everybody  would  prefer  to  have  a 
crisp  number  of  .8  to  high.  It  is  a  question  of  justifiability. 

It  is  really  justified  to  say  .8  based  on  the  information  that 
you  have.  I  would  like  to  return  to  the  point  you  made  and  I  think  it 
is  a  very  good  one — I  hope  Professor  Lindley  will  not  take  offense  to 
that. 


230  - 


1  am  pretty  sure  If  he  stood  in  front  of  a  blackboard  and 
explained  to  us  the  way  that  figure  of  .2  was  arrived  at,  we  would  not 
accept  that  kind  of  an  analysis. 

We  would  see  that  it  is  shot  through  with  all  sorts  of 
assumptions  and  this  and  that.  And  in  the  end  we  would  say  look,  .2  has 
no  justification  whatsoever;  that  the  most  you  could  have  said  in  terms 
of  intervals  is  between  zero  and  .5,  or  in  fuzzy  terms  that  it  is  low. 
That  is  the  most  you  can  say. 

Most  of  us  also  would  like  .2.  So  that  what  we  have  to 
differentiate  is  between  acceptability  and  justifiability.  It  is  a  very 
different  thing.  Is  it  justified  to  be  that  precise? 

In  the  case  of  classical  expert  systems,  MYCIN  and  PROSPECTOR,  my 
contention  is  that  whatever  answer  they  come  up  with,  those  certainty 
factors  are  unjustifiably  precise.  Unjustifiably  precise.  But  people 
like  that. 

DR.  SPIEGELHALTER:  I  don't  want  to  jump  to  what  I  am  going  to 
say  tomorrow,  but  for  any  point  probability  that  anyone  gives  out,  one 
should  definitely  give  ranges,  and  there  are  at  least  three  different 
types  of  ranges  that  one  could  give  around  it. 

Certainly  a  point  probability  on  its  own  is  certainly  without  any 
justification  for  it.  But  there  is  no  need  to  deviate  from  probability 

theory  in  order  to  provide  some  idea  of  the  possible  variation  around 
that  point  probability. 

If  one  has  to  act,  then  the  point  probability  is  the  one  that  one 
should  use.  But  in  order  to  justify  it  to  someone,  then  it  is  quite 
reasonable  that  the  possible  variation  in  that  probability  by  a  slight 
change  in  the  analysis,  by  the  imprecision  of  the  inputs,  can  be  given 
out  as  part  of  the  output. 

That  again  is  like  it  says  high,  and  you  can  say  what  does  high 
mean.  Similarly  in  a  probabilistic  system,  it  will  say  .2  and  you  say 
what  does  that  .2  mean?  And  it  will  give  you  a  distribution  around  that 
.2  and  tell  you  what  the  distribution  means. 

DR.  ZADEH:  Is  it  a  probability  distribution? 

DR.  SPIEGELHALTER:  Well,  there  are  different  distributions  one 
could  give.  One  can  give  a  probability  distribution. 

DR.  ZADEH:  Then  you  are  talking  about  second  order  probability? 

DR.  SPIEGELHALTER:  Yes,  essentially.  What  I  am  saying  is  there 
Is  the  hierarchy  of  uncectainties  about  the  Imprecision  on  the 
probabilities,  and  whether  or  not  it's  represented  by  the  second  order 
probability  distributions  or  whether  it's  represented  by  fuzzy  calculus 
is  an  Imprecise  number  that's  put  on  there. 


If  one  is  really  looking  for  the  meaning,  the  only  things  I 
understand  are  probability  distributions  and  the  only  thing  the  people  I 
work  with  understand  are  probability  distributions. 

I  would  not  want  to  use  a  system  that  generated  something  that  I 
could  not  explain.  One  thing  that  has  been  mentioned,  in  Dennis’s  talk, 
is  the  idea  of  external  calibration  of  probabilities.  Can  they  have  a 
meaning  calibratable  against  events  in  the  world? 

Okay,  everyone  has  to  bring  in  some  idea  of  a  long-run  frequency 
t.o  that  argument,  which  is  perhaps  unacceptable  to  really  pure 
Bayesians,  but  there  is  that  idea  of  meaning  that  can  be  given  to  the 
numerical  outputs.  And  I  find  those  concepts  of  meaning  and 
Justification  missing  in  linguistic  output  from  an  expert  system. 

DR.  YAGER:  You  say  the  only  thing  you  understand  is  probability 
distributions.  In  point  of  fact,  possibility  distributions  are  all  over 
the  place.  A  perfect  example  of  that  is  think  about  linear  programming, 
for  example. 

Are  you  familiar  with  linear  programming?  You  have  linear 
programming  and  you  have  some  function  you  want  to  maximize,  and  you 
have  some  constraints.  You  cut  off  this  space  and  you  say  what  solution 
optimizes  this. 

And  before  you  do  the  operations  you  say  well  it  has  to  be 
something  within  this  space  of  possible  solutions.  Some  definitely 
can't  be  and  some  definitely  can  be. 

That  is  a  possibility  distribution,  albeit  one  that  just  says 
zero/one  membership  grade,  but  that  is  a  perfect  example  of  possibility 
distribution.  Then  if  you  look  at  the  objective  function  maybe  you 
could  sort  of  say  the  answer  can't  be  over  here,  it  could  be  over  here, 
and  may  be  over  here;  and  maybe  you  could  get  some  other  numbers  other 
than  zero  or  one.  But  that  is  a  possibility  distribution. 

DR.  SPIEGELHALTER:  Sounds  like  a  restriction  of  a  range.  One 
can  state  one's  own  personal  uncertainty  as  to  where  in  that  range  the 
thing  is. 

DR.  SINGPURWALLA :  You're  uncertain  where  the  solution  lies.  You 
can  give  a  probability  for  the  solution  lying  on  each  corner  before  you 
solve  the  problem,  so  I  don't  see  any  difficulty  with  linear 
programming. 

DR.  DeGROOT:  Let  me  try.  I  think  that  he's  not  talking  about 
second  order  probabilities.  What  we  are  talking  about--and  I  agree  with 
much  of  what  Lotfi  Zadeh  said — but  I  don't  like  the  term  that  I  wouldn't 
agree  with  Dennis's  probability  of  .2  because  it  is  unnecessarily 
precise. 


-  232  - 


iirii  ruitritirMJTfirK’rj  LTHWlfBWI  RkyjMAI^t  ^V^J^^UIJT.V■i^V-l^yl^:^»^.:^  ll^  i 


I  may  not  agree  with  that  because  when  he  told  me  his  argument  I 
would  see  there  were  many  other  possible  probabilities  and  if  I  wanted 
to  think  about  it  I  would  try  and  determine  some  for  myself  and  might 
arrive  at  a  final  overall  probability  quite  different  from  .2. 

I  think  what  that  means  is  that  the  .2 — not  that  it  is 
unnecessarily  precise — Just  because  I  wouldn’t  agree  with  it,  but  that 
it  is  very  sensitive.  There  are  some  probabilities  that  are 
Insensitive,  insensitive  to  learning,  to  further  data. 

If  Dennis  says  that  he  has  studied  this  for  ten  years  or  more  and 
studied  the  problem  and  looked  into  all  the  possible  relevant  sources  of 
information  and  so  on  and  he  knows  whatever  there  is  to  know  about  the 
subject,  the  probability  is  .2  and  there  is  very  little  that  anyone  else 
could  tell  him  at  this  point  that  would  change  those  probabilities,  that 
is  one  kind  of  probability  .2. 

There  are  many  other  kinds  of  probability  .2's,  namely  very 
sensitive  ones  where  any  little  bit  of  further  information  could  change 
that  probability  dramatically.  And  I  believe  that  is  what  you  would 
refer  to  as  a  situation  where  you  really  need  some  sort  of  a  fuzzy 
statement.  I  wouldn't  say  the  probability  is  unnecessarily  precise.  I 
would  say  it  is  very  transient  in  nature  and  very  sensitive,  in  a  sense, 
to  any  other  little  bit  of  information.  If  I  go  home  tonight  and  I 
think  about  it,  it  would  change  from  .2  Just  by  remembering  Shakespeare 
from  high  school  and  God  knows  what. 

So  maybe  there  is  not  much  point  in  specifying  an  exact  number  if 
it  is  going  to  change  very  soon  anyway.  I  mean  we  all  know  the  example 
about  the  probability  is  .5  of  getting  a  red  ball  from  the  box  because 
we  have  no  idea  of  the  contents,  and  the  probability  is  .5  about  getting 
a  red  ball  from  the  box  because  we  are  certain  that  exactly  half  the 
balls  are  red. 

They  are  both  .5.  To  me  they  mean  exactly  the  same  thing  but  I 
certainly  recognize  that  one  is  very  sensitive  to  further  information 
and  one  is  totally  insensitive  to  further  information. 

I  don't  think  you  have  to  get  into  second  order  probabilities, 
but  Just  if  we  recognize  that  our  overall  final  marginal  probability  is 
a  weighted  average  of  some  things. 

In  one  case,  it  is  a  weighted  average  of  wide  variety  and  in  the 
other  case  it  is  a  weighted  average  of  a  very  tight,  tight  range.  Maybe 
we  will  settle  these  issues  because  maybe  we  are  all  trying  to  say  the 
same  thing  in  somewhat  different  languages. 

DR.  DeGROOT:  David,  don't  give  your  talk  for  tomorrow  now. 

DR.  SPIEGELHALTEr\  Okay.  I  will  be  expanding  on  exactly  that 
tomorrow. 


DR.  DeCROOT:  Oh,  I'm  sorry.  I  didn't  mean  to  give  your  talk  for 
tomorrow  here. 

( Laughter ) 

I  think  we  have  time  for  Just  a  couple  more  questions. 

DR.  KONG;  I  think  I  agree  totally  with  what  Professor  DeGroot 
has  said.  I  have  been  looking  at  the  belief  functions  for  a  few  years 
and  I  have  been  like  using  sort  of  something  like  upper  and  lower 
probabilities  and  I  never  pretend  to  say  that  because  I  have  two  numbers 
it  is  better  than  one — of  course  something  like  an  interval  of  .1  and 
cannot  be  more  precise  than  something  like  .3.  In  the  sense  of  what  is 
the  precision  of  the  value  .1  and  .M?  In  fact,  sometimes  I  may  prefer 
something  like  an  interval  like  the  evidence  that  the  belief  function  is 
based  on — I  think  this  is  something  which  Glenn  has  written  on  for  a 
long  time. 

Let's  say  if  I  have  a  belief  function  which  has  .1  probability 
supporting  the  proposition  of  A  and  .^4  probability  supporting  the 
proposition  not  A,  then  basically  I've  got  an  interval  for  the 
proposition  A,  something  like  from  .1  lower  probability  to  2.6  upper 
probability. 

So  I  have  a  range.  But  another  way  to  look  at  it  is  I  actually 
start  with  a  Bayesian  distribution  function  which  has  like  .2  and  .8 
probability,  which  is  a  Bayesian  distribution  because  it  adds  up  to  ore. 

But  then  1  reevaluate  my  evidence  and  sometimes  it  seems  that  my 
evidence  may  not  be  relevant  in  this  situation,  and  1  say  maybe  about  .5 
of  the  times  I  think  this  piece  of  evidence  may  not  be  dependable  at 
all. 

And  when  this  piece  of  evidence  is  not  dependable,  then  I  don't 
know  anything  at  all.  Then  what  we  would  do  is  basically  we  would 
discount  the  Bayesian  distribution  of  .2  and  .8  by  sort  of  a  factor  of 

.5. 

Again  this  value  .5  is  not  really  precise.  It  is  not  exactly  .5. 
I  may  mean  something  around  .4*8  but  I  just  picked  .5.  So  by  doing  that 
I  end  up  with  what  I  originally  have.  I  have  the  range  from  .1  to  .6. 

And  basically  what  this  will  do  is  when  I  have  other  evidence, 
when  I  come  up  with  other  evidence  which  is  sort  of  maybe  conflicting, 
sort  of  a  point  towards  another  direction,  then  if  I  have  a  discounting 
factor  which  is  big,  then  it  will  be  much  more  sensitive  to  the  new 
evidence.  It  will  be  dominated  by  the  new  evidence,  especially  when  it 
is  very  contradicting. 

f 

DR.  DeCROOT:  Well,  we  will  talk  about  that  tomorrow,  discounting 
factors.  I  have  some  questions  about  those,  too. 


DR.  KONG:  So  basically  I  Just  want  to  say  that  using  two  numbers 
doesn't  imply  that  we  think  it  is  more  precise  than  one  number.  It  is 
not  that  at  all. 

DR.  DeGROOT:  Mr.  Wise,  a  short  comment,  please. 

DR.  WISE:  Real  quick  comment.  If  you  are  using  upper  and  lower 
probabilities  and  you  are  trying  to  make  decisions  and  look  at  basic 
decision  theory,  it  can  make  a  big  difference  whether  you  assume  that 
the  probabilities  are  uniform,  in-between,  or  skewed  to  one  end. 

And  if  you  just  strictly  use  the  upper  and  lower  probabilities, 
you  don't  know  and  it  could  greatly  affect  your  decision. 

DR.  SOYER:  Whatever  the  value  P  is,  it  is  just  his  subjective 
probability.  The  argument  on  precision  then  is  a  problem  of  the 
decision-maker  when  he  is  evaluating  Professor  Lindley's  probability. 

So  according  to  the  decision-maker's  belief  about  his  expertness,  he  can 
always  change  that  probability  by  making  a  conditional  statement.  So  it 
is  just  a  matter  of  evaluation  of  the  probability  forecaster  or 
predictor. 

DR.  DeGROOT:  Okay.  Well,  I  think  this  is  a  good  start  for 
tomorrow's  discussions.  I  do  want  to  say  something  important  before  we 
close.  That  is,  it  has  been  a  long  time  since  I  have  stayed  awake  for  a 
whole  day  of  talks  and  I  did  today,  and  I  think  it  was  not  only  because 
I  had  to  be  awake  at  the  end  to  stand  up  here  and  do  this,  but  also 
because  all  three  talks  were  really  excellent. 

They  were  pitched  at  a  level  that  I  could  understand.  They  were 
clear  and  I  was  very  impressed  and  I  learned  a  lot  about  presenting  good 
talks  today  and  I  do  want  to  congratulate — I  hope  you  will  join  me  in 
congratulating  all  three  speakers.  Thank  you. 


GENERAL  DISCUSSION  II* 


DR.  DeGROOT:  Are  there  other  questions  pertaining  to  David's 
talk?  We  welcome  general  comments  also  at  this  time.  I  would  like  to 
reserve  the  hour  after  lunch  to  give  each  of  our  four  speakers  15 
minutes  or  so  to  give  their  responses  and  general  impressions  of  these 
two  days.  If  there  are  other  comments  prior  to  lunch,  I  would  welcome 
hearing  them.  In  particular,  I  again  renew  my  invitation  of  those  of 
you  who  have  comments  about  the  practical  Implementablllty  of  the 
various  systems.  I  know  we  would  like  to  hear  those.  Art,  do  you  have 
a  comment? 

DR.  DEMPSTER:  I  have  had  lots  of  turns,  if  there  are  others.  I 
could  raise  a  topic. 

DR.  DeGROOT:  You  win. 

DR.  DEMPSTER:  In  a  sense  I  will  follow  up  a  little  on  Morrie's 
issue  about  the  variability  or  refinement  of  probabilities  and  will  try 
to  relate  it  to  something  that  Lotfi  Zadeh  has  been  saying.  Lotfi  has 
raised  the  issue  a  number  of  times,  about  complexities  somehow  are  an 
enemy  of  probability  or  the  bellevability  of  probability. 

And  there  is  a  sense  in  which  I  am  sympathetic  with  that  idea.  I 
feel  that  as  an  applied  Bayesian  statistician  when  I  make  the  models 
more  and  more  complex,  I  perhaps  trust  them  less  as  I  have  to  do,  and  he 
states  somehow  as  though  it  is  a  consequence  of  fuzzy  logic  that  somehow 
the  probabilities  will  become  more  confused.  They  will  just  have  a 
range  zero  to  one  and  will  be  essentially  useful  and  I  am  interested  in 
knowing,  and  perhaps  he  can  elaborate  on  this  sometime,  in  what  sense  is 
that  defensible?  What  sort  of  logic  is  used  to  draw  that  conclusion? 

I  think  the  issue  is  important  in  part  because  in  Bayesian 
theory,  if  you  have  a  Bayesian  model,  the  more  information  you  get  about 
the  patient  in  some  sense,  the  better  off  you  have  to  be.  You  have  a 
more  refined  Judgment  and  I  am  sure  mathematical  criteria  measures  of 
information  could  be  created  that  showed  that  in  fact  you  get  more 
information.  So  in  that  theory  more  information  is  always  better,  but 
apparently  according  to  fuzzy  logic,  more  information  can  be  worse. 

So  there  is  a  kind  of  paradox  here  which  might  help  to  resolve 
the  difference  between  the  approaches.  I  think  again  that  belief 
functions  are  very  likely  on  the  same  side  as  probability  in  this 
dispute,  although  I  am  not  familiar  with  the  technical  aspect  of  belief 
functions  which  would  argue  that  more  data  is  more  information  in  the 
probabilistic  sense. 


•Session  followed  Dr. 
and  its  discussion 


Sp^igelhalter's  presentation 


So  1  Just  wantsd  to  raise  this  issue  for  further  discussion. 

DR.  OeGROOTt  Gabe. 

DR.  PEI:  I  don't  know  if  soae  of  your  eoaaents  were  directed 
toward  ae,  but  I  have  an  exaaple  of  an  application  of  Bayesian  aodeling. 
It  is  an  area  I've  been  working  in  for  the  last  several  years,  having  to 
do  with  the  Navy,  particularly  with  the  search  for  aovlng  targets.  The 
approach  has  been  extreaely  basic,  where  we  coae  out  with  prior 
distributions  for  the  target's  location  and  aodel  detection  capabilities 
as  carefully  as  we  can,  and  then  we  update  posterior  distributions  for 
the  target. 

On  paper  it  seeas  to  hold  together  very  well  and  the  individual 
parts  of  the  aodel  sees  to  calibrate  very  well.  We  can,  for  exaaple, 
calibrate  the  capabilities  of  sensors.  We  can,  for  exaaple,  validate 
the  behavior  of  aovlng  targets.  When  you  want  to  asseable  all  of  these 
aodels  together  and  try  to  do  prediction  or  try  to  optiaize  various 
plans,  we  have  difficulty  and  probably  soae  of  the  problea  is  due  to  the 
stochastic  nature  of  the  problea,  the  thing  that  evolves  over  tiae.  We 
have  tried  to  coae  out  with  stochastic  aodels.  What  you  do  is  try  to 
take  that  into  account.  But  they  never  seea  to  behave  in  exactly  the 
way  we  expect  thea  to  behave  when  we  go  out  and  try  to  use  thea. 

Part  of  the  difficulty  I  think  has  to  do  with  the  aaount  of 
inforaation  that  we  think  is  out  there  and  we  have  coae  up  with  plans. 

We  have  coae  out  with  predictions  which  after  a  period  of  tiae  tend  to 
be  very  precise.  As  a  aatter  of  fact,  even  if  you  get  no  observable 
detection,  that  is  still  inforaation  in  the  sense  of  where  you  think  the 
target  is  not.  And  so  after  a  period  of  tiae  your  estiaates  becoae 
very,  very  precise  and  yet  aany  of  the  tiaes  they  are  coapletely  off, 
totally  off.  That  is  aiaply  because  over  any  reasonable  stretch  of  tiae 
the  aodels  tend  to  break  down. 

Now,  we  can  try  to  patch  the  aodels  together  by  adding  aore 
paraaeters.  But  it  turns  out  that  huaan  operators  do  a  lot  better  in 
these  aodels  if  they  can  interpret  the  process  at  any  tiae  to 
re-initialize  the  problea  at  any  tiae  and  do  a  lot  better  than  any 
autoaated  process  which  we  can  build  into  that. 

So  that  is  a  puzzling  one  —  well,  I  don't  know  if  it  is 
puzzling,  but  it  is  an  aspect  of  trying  to  use  probabilistic  statistical 
aethods  to  a  very  practical  problea  which  tends  to  have  certain 
liaitations.  I  don't  know  what  the  solution  to  this  is.  I  don't  know 
if  anyone  has  any  coaaents  on  that. 

DR.  DeCROOT:  Thank  you.  Mr.  Kong,  do  you  want  to  describe  soae 
of  your  work?  This  aight  be  a  good  tiae  if  you  would  like  to  do  that, 
aaybe  take  about  ten  ain^tes  or  so. 


NR.  KONG:  What  I  aa  working  on  la  a  causal  Mdical  diagnostic 
■odal.  The  basic  building  block  of  our  Bcdal  consists  of  a  symptom  S  and 


diseases  0, 


..,D  ,  which  are  possible  causes  of  S. 


Consider  the  siaple  case  where  there  are  only  two  possible 
causes,  and  D2.  The  relationships  between  the  diseases  is 
represented  by  the  following  diagraa 


(Diagram  1) 


The  arrows  pointing  froa  and  to  S  indicate  the  causal 
relationships.  Probabilities  Pj^  and  P2  ^e  attached  to  the  arrows. 
There  is  an  arc  between  the  two  arrows  with  a  probability  q  attached  to 
it. 

The  nuaber  p.  is  interpreted  as  a  causal  probability.  It  is  not 
the  conditional  proMbility  of  the  syaptoa  given  the  disease  Di.  It  is 
the  probability  that  causes  S  given  the  presence  of  Thus  it  can 
be  thought  of  as  the  Iwer  probability  of  S  conditioned  on  the  presence 
of  0^.  This  is  because  if  does  not  cause  S,  S  can  still  be  present 
because  of  other  causes.  What  we  are  doing  is  attaching  a  probability 
to  a  logical  stateaent.  Logical  stateaents  can  be  represented  by  sets 
as  illustrated  in  Table  1.  (For  notations,  let  -  {d^,  d  ),  V2  •  (<>2. 
d2)  and  S  -  (s,  s)  be  the  outcoae  space  of  Di,  D2  and  S  res|Mctively, 
with_d^, _d2  and_s  denoting  the  presence  of  the  diseases  and  the  syaptoa, 
snd  dj^,  d2  and  s  denoting  the  absence  of  the  diseases  and  the  syaptoa.) 

Table  1 


Probability  attached 


Set  Representation 


di  -s 


(  (d,,s),  (d,,s),  (d,,s)  } 


d2  -S 


{  (d2,s),  (d^.s),  (d^.s)  ) 


Consider  the  logical  stateaent  *d^  laplies  s”.  If  d^  is  the 
outcoae,  then  s  aust  be  the  outooae.  OnShe  other  hand,  if^G^  is  the 
outcoae,  then  both  s  and  s  are  possible.  This  explains  the  set  which 
corresponds  to  "d.  ^  s"  in  Table  1. 


-  •  »  -  ^  V  >*.-J^3JJjLPJf^'|g  TV  TV  "V"*  ymwumLTiL'v  ^>:«v', 


OR.  VISE:  Ar»  thoM  last  two  pairs  (rafarring  to  tha  first  lina 
of  Tabla  1)  supposad  to  ba  idantiosl?  Is  thara  s  bar? 

HR.  KONG:  Mo,  thay  ara  not  idantiosl.  Thara  is  a  bar  hara 
(rafarring  to  tha  bar  ovar  a  in  tha  last  pair).  Tha  product  apaoa 
eorrasponding  to  and  S  has  four  alaaants.  Hara  I  allow  thraa  of  tha 
four.  Tha  pair  (dj^.s)  is  rulad  out.  His  situation  is  siallar  for  62’,  ^2 
iaplias  s  with  probability  P2.  Tha  only  dlffaronea  batwaan  lina  2  and 
lina  t  of  tha  tabla  is  that  all  tha  l*s  ara  ehangad  to  2's. 

Now  wa  eoaa  to  tha  arc  and  tha  nuabar  q.  Tha  arc  stands  for  tha 
logical  stataaant  "s  iaplias  at  laast  ona  of  d^^  and  d^”  with  probability 
q  attached  to  it.  Both  disaasas  can  ba  present.  Notice  that  tha 
logical  stataaant  "a  iaplias  at  laast  ona_of  di  and  d2*  is  aquivslant  to 
tha  logical  stataaant  "di  and  d2  iaplias  s”.  Tha  sat  which  corresponds 
to  this  logical  stataaant  is  tha  product  space  x  V2  *  S  ainus  tha 
alaaant  (3^,  92* 

Tha  arc  and  tha  arrows  each  corresponds  to  a  belief  function. 

For  exaapla,  the  arrow  pointing  froa  0^  to  S  raprasants_a  a  belief _ 
function  with  two  focal  alaaants.  Thay  ara  {  (di,s),  (d^,s),  (di.s)  ), 
tha  set  in  Tabla  1.  and  tha  product  space  basic  probability 

assignaMnts  ara  P]^  and  1  '  Pi  raspactivaly. ^  Assualng  that  tha  p's  snd  q 
ara  indapandant  probabilities,  tha  belief  functions  oorrasponding  to  tha 
arrows  and  tha  arc  can  ba  coabinad  over  tha  product  space  V  i  x  V  2  ^  ^ 
using  Oaapstar's  Rule.  Ranoraalization  is  not  necessary  whan  coabining 
these  belief  functions  because  thara  is  no  conflict  batwaan  than.  Also 
notice  that  tha  belief  functions  sra  all  purely  relational,  naaning  that 
tha  aarginal  belief  functions  of  Di,  D2  and  S  ara  all  vacuous. 


DR.  SINGPURWALLA:  Khan  you  say  these  probabilities  ara 
indapandant,  what  do  you  aaan? 


MR.  KONG:  It  aaans  that  tha  disaasas  ara  considered  as 
indapandant  causes  of  tha  synptoa.  Consider  this  (referring  to  diagran) 
as  an  exaapla.  Tha  disaasas  and  D2  ara  indapandant  causes  of  S. 

Each  of  Di  and  D2  can  causa  S,  but  thay  do  not  interact  with  each  other. 
For  instance,  if  both  and  D2  ara  present,  than  tha  probability  that 
thay  will  not  causa  tha  synptoa  is  (1*p^)  (1-P2). 


Dependant  causes  can  possibly  ba  nodalad  by  sonathing  like  this 


Th«  nod*  D3  can  •ith*r  b*  r*al,  aeanlng  that  th*r*  is  actually  a  dlsaaa* 
O3,  or  artificially  oonatructad.  Th*  nuabors  P3,  P2^3  and  P23  ar* 
probabilities  attached  to  the  corresponding  arrows  in  th*  dlagraa.  Here 
and  D2  are  dependent  causes  of  S  because  they  are  both  causes  of 
which  in  turn  is  a  cause  of  the  sy^>toa  S. 

A  special,  case  of  this  is  the  problen  of  disease  class,  tdiic.i  I 
think  both  Glenn  and  David  have  talked  aoout  before,  Assua*  that 
diseases  D,  and  D2  fora  a  disease  class  which  we  call  D3.  The 
relationships  between  Di,  D2  and  D3  can  then  be  represented  by  this 


(Diagram  3) 


2 


The  p's  are  both  1's  because  if  the  patient  has  one  of  and  D2,  he 
oust  have  the  disease  class  D3.  Slailarly,  q  is  1  because  if  the 
patient  has  the  disease  class  D3,  then  he  oust  have  at  least  one  of  D^ 
and  D2< 

In  the  ease  of  disease  class,  soeet !■*;:«  we  nay  want  to  sinplify 
the  problea  by  assuaing  that  the  patient  has  one  and  only  one  disease. 

In  our  aodel,  this  is  not  necessary.  Me  aake  this  additional  assuaption 
only  if  it  is  reasonable.  In  aost  situations  aultiple  diseases  should 
be  allowed.  Our  aodel  is  quite  flexible  in  this  respect. 

These  causal  structures  (referring  to  dlagraa  1)  are  Just 
building  blocks  of  a  aodel.  The  aodels  I  have  been  studying  are  called 
layered-aodels.  The  diseases  and  syaptcas  ar*  grouped  into  layers, 
soaething  slallar  to  what  David  has  talked  about.  The  idea  is  that  we 
only  allow  arrows  pointing  froa  a  node  to  nodes  which  are  located  one 
level  above  it.  So  the  aodel  aay  look  soaething  like  this 


This  BOdsl  is  aors  gensrsl  than  the  tall  graph  that  Glenn  has  described. 
If  tfe  ignore  the  direction  of  the  arrows,  there  can  be  loops  in  the 
graph.  On  the  other  hand,  if  we  follow  the  directions  of  the  arrows,  we 
won't  run  into  loops. 

I  arrange  the  nodes  into  layers  aainly  for  the  purposes  of 
iapleaentation  and  coaputation.  The  general  case  will  be  that  we  have  a 
set  of  nodes  and  a  set  of  arrows  pointing  froa  nodes  to  nodes.  If  it  is 
true  that  there  are  no  loops  if  we  follow  the  directions  of  the  arrows, 
then  we  can  alwys  rearrange  the  nodes  (aaybe  sons  artificial  nodes  have 
to  be  added)  such  that  they  fora  layers. 

The  arrows  and  ares  in  a  graph  correspond  to  slaple  belief 
functions  over  the  product  space.  Theoretically,  eoabining  these  belief 
functions  over  the  product  space  will  give  us  the  global  relationships 
of  the  diseases  and  the  syaptoas.  On  the  other  hand,  the  aaount  of 
eoaputatlons  required  aay  aake  this  practically  lapossible. 

What  we  are  Interested  in  doing  is  the  following.  We  have  a 
patient  who  is  observed  to  have  certain  characteristics,  the  absence  and 
presence  of  soae  syaptoas.  Based  on  the  observations,  we  want  to  find 
the  conditional  aarginal  belief  functions  of  soae  diseases  we  are 
Interested  in.  The  aajor  roadblock  to  this  task  is  again  the  aaount  of 
eoaputatlons  required.  Fortunately,  the  aaount  of  eoaputatlons  can  be 
reduced  by  taking  advantage  of  the  layered  structure  of  the  diseases  and 
the  syaptoas. 

Consider  a  two*layer  aodel  where  the  bottoa  layer  consists  of 
diseases  and  the  top  layer  consists  of  syaptoas.  The  diseases  are 
aarginally  independent,  aeaning  that  if  nothing  is  known  about  the 
syaptoas,  then  the  observation  of  one  disease  does  not  provide 
inforaation  about  another  disease.  This  property  does  not  hold  for  the 
syaptoas.  One  way  of  thinking  about  this  is  that  Inforaation  can 
propagate  downwards  and  then  upwards,  but  it  cannot  propagate  upwards 
and  then  downwards.  Because  of  the  latter,  the  aaount  of  eoaputatlons 
can  be  reduced. 

But  even  though  we  have  this  kind  of  structure,  actual 
eoaputatlons  of  the  Joint  belief  function  will  still  be  lapossible  in 
aost  cases.  The  approach  I  aa  thinking  of  right  now  is  the  Monte  Carlo 
aethod.  The  Monte  Carlo  aethod  aay  work  in  this  case  because  we  are 
interested  in  soae  aarginal  beliefs  instead  of  the  Joint  belief  function 
itself. 

If  we  have  100  diseases  and  syaptoas,  the  product  space,  which  is 
also  the  fraae  of  dlscernaent,  has  2  to  the  100  eleasnts.  The  belief 
function  over  the  product  space  is  defined  by  2  to  the  2  to  the  100 
nuabers.  Since  we  are  oply  interested  in  the  aarginal  beliefs  of  a  few 
diseases,  we  can  use  the  Monte  Carlo  aethod  to  estiaate  only  those 
nuabers  we  need. 


OR.  D«CROOT:  Thank  you  very  aueh.  No,  we  don't  have  tiae  to  run 
through  the  two  to  the  two  to  the  100  right  now,  but  after  lunch 
perhaps. 

i laughter) 

DR.  DeGROOT:  Are  there  further  conaents  or  questions? 

DR.  ZADEH:  There  is  a  branch  of  logic  called  inductive  logic, 
which  is  very  closely  related  to  this  trtiole  thing.  With  inductive  logic 
you  have  your  associated  probabilities  with  foraulas.  For  exaaple, 
iaplies  ?2  ^  ^  whatever.  He  have  foraulas.  So  you 

associate  the  probabilities  with  each  foraula  and  then  ask  what  will  be 
the  probability  of  soae  other  foraula. 

Generally,  you  coae  up  with  bounds.  Thst  is  the  standard  thing, 
which  relates  to  Dr.  Shafer's  theory  in  that  you  associate  probabilities 
not  with  atoas  but  with  foraulas,  with  propositions  and  that  is  why  this 
resulted  in  what  Augustine  just  said.  Basically  what  happens  in  that 
logic,  very  closely  related  to  that,  is  that  you  coae  up  with  bounds  on 
the  probability  of  a  given  proposition  or  foraulas  along  those 
propositions  and  generally. 

And  this  is  the  work  that  soae  people  in  AI  have  becoae 
interested  in  aore  recently.  They  call  it  probabilistic  reasoning. 
Probsbilistlc  reasoning  is  very  closely  related  to  inductive  logic  and 
soae  of  the  basic  papers  on  this  subject  go  back  aany  years.  Also, 

Barry  Adaas  has  written  a  large  nuaber  of  papers  dealing  with  the 
question  of  propagation  of  probabilities  froa  the  prealses  to  the  final 
conclusion. 


'"■Ax 


DR.  DeGROOT:  Are  there  any  coaaents  over  here?  Alright,  let  us 
adjourn  for  lunch.  After  lunch  I  will  call  on  Glenn  Shsfer,  Lotfl  Zadeh 
again  and  Dennis  Lindley,  give  thea  each  about  15  ainutes  or  so  and 
David  also,  although  we  have  heard  froa  you  aore  recently,  and  give  thea 
a  chance  to  suaaarize  their  views  and  differences  with  the  others. 

Thank  you. 


•'.V'.N*.'. 


-  242  - 


■.  *.  V 


*>  .V 


CONCLUDING  DISCUSSION 


DR.  DeGROOT:  Thank  you.  This  session  is  scheduled  to  give  our 
speakers  a  chance  to  give  their  reactions  to  the  proceedings  and  I  would 
like  to  re-introduce  Glenn  Shafer. 

DR.  SHAFER:  When  Nozer  was  encouraging  ne  yesterday  to  try  to 
make  a  case  for  belief  functions  I  sat  down  and  tried  to  think  of  some 
things  I  should  say  and  I  am  not  sure  1  have  produced  the  strongest  case 
I  can.  1  think  it  may  be  more  of  a  case  of  taking  the  opportunity  to 
say  things  the  second  time. 

I  thought  I  would  start  with  a  caricature  of  a  150-year-old 
controversay  in  statistics,  the  controversy  between  Bayesian  and 
frequentist's  points  of  view.  One  way  to  caricature  that  was  the 
frequentlst  view  wants  to  only  look  at  the  probability  Judgments  that 
are  based  on  observed  frequencies,  which  is  a  very  ideal  particular  kind 
of  evidence.  It  is  not  only  observed  frequencies,  but  clearly  relevant 
observed  frequencies. 

The  Bayesian  view  is  a  little  different,  especially  if  you  look 
at  not  the  Bayesian  view  as  it  might  have  been,  not  if  you  look  at  what 
we  now  think  of  as  Bayesian  ideas,  but  if  we  look  at  the  Bayesian 
philosophy:  De  Finneti,  Savage  and  Ramsey;  this  philosophy  seems  to  not 
pay  any  attention  to  the  quality  of  the  evidence  at  all. 

One  view  says  if  we  do  not  have  this  very  special  kind  of 
evidence  we  can't  make  probability  Judgments  at  all.  This  other  view  is 
that  the  quality  of  evidence  does  not  matter  if  we  can  write  down  events 
A  and  B,  then  write  down  P  of  B  and  then  we  should  be  able  to  make  a 
Judgment  about  that,  whether  we  have  good  evidence  for  it  or  bad 
evidence  for  it.  The  philosophy  does  not  have  any  place  in  it  for 
discussing  that. 

Presumably  that  is  left,  in  some  sense,  to  the  pragmatics  on  the 
subject.  It  is  Just  not  in  the  philosophy  of  the  subject.  I  want  to 
pose  to  you  the  question  "how  can  we  possibly  find  the  middle  ground 
between  these  two  extremes?"  It  seems  to  me  we  need  a  vocabulary,  a  way 
of  talking  that  naturally  leads  a  little  more  to  a  middle  ground.  I 
think  the  way  of  talking  that  we  need  is  to  talk  about  constructive 
probability,  to  emphasize  the  fact  that  probability  Judgments  are  the 
things  that  we  can  construct  based  on  evidence. 

Another  way  of  putting  that  is  to  emphasize  that  a  probability 
analysis  is  only  an  argument.  1  have  tried  in  some  of  the  things  I  have 
written  in  the  last  few  years  to  give  some  depth  to  that  by  talking 
about  comparison  to  canonical  examples.  In  my  talk  yesterday,  I  was 
talking  about  how  belief* functions  have  one  set  of  canonical  examples 
and  Bayesian  calculations  have  another  set  of  canonical  examples. 


-  243  - 


What  you  are  doing  when  you  aake  a  probability  analysis,  it  seems 
to  me.  is  you  are  taking  a  natural  situation  and  making  a  comparison  to 
those  canonical  examples  and  you  are  usually  doing  that  by  comparing 
pieces  and  then  trying  to  put  things  together  to  fit  that  picture.  It 
seems  to  me  that  process  does  constitute  an  argument,  because  you  are 
saying,  look,  knowing  this  is  like  knowing  that  there  was  a  20  percent 
chance  of  this  happening  and  knowing  this  is  like  knowing  there  is  a  30 
percent  chance  of  this  happening  if  such  and  such  is  true. 

The  element  of  argument  there  is  that  maybe  it  is  convincing  to 
say  it  is  like  knowing  that  and  maybe  it  is  not  convincing.  Maybe  your 
evidence  does  seem  to  have  that  strength  and  structure  and  maybe  it  does 
not.  When  Professor  Lindley  was  talking  about  his  20  percent 
probability  for  Shakespeare  writing  Hamlet,  what  was  it  we  wanted  him  to 
give  us?  Professor  DeGroot  gave  one  way  of  explaining  why  we  were  not 
happy  with  what  he  was  saying.  Professor  DeGroot's  way  of  explaining  it 
was,  well,  we  would  like  to  see  him  break  that  down  into  a  weighted 
average  of  conditional  probabilities  so  that  we  could  go  and  look  at  it 
and  perhaps  give  some  different  values  there  and  come  up  with  our  own 
Judgments. 

I  think  that  is  right  as  far  as  it  goes,  but  I  would  like  to  go  a 
little  farther.  I  think  what  we  really  want  to  see  there  is  an 
argument.  We  want  to  see  what  Dennis'  evidence  is  and  we  want  to  see 
how  he  is  putting  it  together  and  what  kind  of  argument  he  has  for  his 
high  degree  of  doubt  that  Shakespeare  wrote  Hamlet.  It  is  not 
necessarily  the  case  of  having  seen  his  argument  I  will  feel  that  I  can 
produce  an  argument  for  myself,  because  I  may  not  be  able  to.  I  may  not 
find  the  whole  analysis,  the  whole  set  of  supporting  evidence.  I  may 
not  find  anything  convincing  about  it  at  all.  I  may  not  be  able  to  find 
it  and  put  my  own  numbers  in  there. 

But  what  I  want  to  do  is  see  that  argument,  to  give  myself  a 
chance  to  be  convinced.  One  aspect  of  this  idea,  that  a  probability 
analysis  is  only  an  argument,  is  Just  what  I  said,  that  there  say  not  be 
any  argument.  It  may  not  be  the  case  that  there  is  a  right  way  of 
analyzing  this  evidence  that  is  convincing  and  in  probability  terms  we 
can  produce  either  a  Bayesian  argument  or  belief  function  argument. 

Maybe  there  is  not  anything  convincing  or  that  is  going  to  be  convincing 
there. 

So  the  general  slogan  that  I  would  use  to  summarize  those  ideas 
is  that  probability  is  constructive.  There  was  a  remark  made  by,  I 
think  it  was  Ben  Wise  yesterday,  and  repeated  by  he  and  Terry  Ireland 
today,  in  a  discussion  we  had  over  dinner,  which  I  have  tried  to  resist 
and  1  think  I  should  talk  a  little  bit  about. 

I  better  get  the  slide.  Okay,  we  are  back  to  Fred.  Fred  is  80 
percent  reliable.  Fred  bame  up  and  told  me  the  streets  outside  were 
icy.  I  think  that  Fred  is  60  percent  reliable.  Eighty  percent  of  the 
time  when  he  wanders  up  to  me  and  says  something,  he  knows  what  he  is 
talking  about  and  he  is  telling  me  the  truth.  So  that  80  percent  is  an 
argument  which  I  would  think  of  as  an  80  percent  reason  for  believing 
that  the  streets  are  icy  outside. 


Now,  ay  attitude  is  that  I  think  that  ia  a  good  arguaent.  It  la 
only  an  80  percent  arguaent,  but  that  ia  what  It  la  worth  and  I  am 
willing  to  talk  about,  alnce  I  am  willing  to  talk  about  conatructlng 
probability  Judgmenta.  I  am  willing  to  talk  about  thinking  of  that 
evidence  alone  and  making  a  Judgment  on  the  baaia  of  it,  making  an  80 
percent  Judgment  on  the  baaia  of  it. 

Now,  you  might  aay,  what  about  your  other  evidence?  Well,  ay 
attitude  ia  I  might  have  aoae  other  good  evidence  about  thia  queation 
and  I  might  not.  If  we  are  willing  to  take  thia  point  of  view  that 
evidence  doea  differ  in  quality,  then  we  have  to  contemplate  the 
possibility  that  the  other  evidence  I  have  about  whether  it  is  icy 
outside  may  not  measure  up  to  the  quality  of  Fred's  testin»ny.  It  aay 
be  that  I  could  make  probability  Judgments  about  other  items  of  evidence 
and  bring  them  in  and  strengthen  my  argument.  That  aay  be  the  case. 

On  the  other  hand,  it  may  be  that  the  quality  of  the  other 
evidence  I  could  bring  to  bear  is  so  poor  that  putting  it  to^iether  with 
my  Judgment  about  Fred's  reliability  would  only  weaken  ay  argument.  So 
that  is  the  general  point  I  make. 

So  the  Bayesian  analysis  that  I  show  you  on  the  screen  —  there 
are  many  Bayesian  analyses,  but  the  one  I  suggested  on  the  screen,  in  my 
talk  yesterday,  had  two  additional  Judgments.  One  was  a  prior 
probability  that  it  was  icy  outside  and  the  other  is  the  probability 
that  Fred  would  be  accurate  if  he  is  careless.  I  did  not  emphasize, 
historically  it  is  usually  number  one  for  arguments  about  Bayesian  and 
non-Bayeslan  methods,  the  prior  probability  argument.  I  think  that  is  a 
little  bit  artificial. 

Prior  probability  seems  basically  to  refer  to  other  evidence.  It 
may  well  be  the  case  that  I  have  other  evidence.  In  the  second  version 
of  the  story  I  told,  I  had  a  thermometer  which  I  thought  was  better 
evidence  than  Fred.  So  it  may  well  be  that  1  have  other  evidence  and 
that  I  should  carry  the  argument  further  and  take  that  other  evidence 
into  account.  I  think  it  is  much  more  likely  that  that  would  be  the 
case  than  that  I  could  supply  this  number  two  here,  the  probability  that 
Fred  will  be  accurate  if  he  is  careless.  1  mean  that  is  Just  personal, 
since  I  made  up  the  story.  You  could  make  up  a  story  you  like  better. 

But  it  does  seem  to  me  that  I  can  very  well  Imagine  having  a 
general  impression  about  a  person's  reliability  and  being  willing  to 
compare  my  situation  to  sort  of  a  random  draw  from  the  different 
situations  where  he  is  reliable  and  not  reliable.  I  can  well  imagine 
that  being  convincing  to  me,  while  it  would  not  be  convincing  for  me  to 
try  to  model  Just  what  is  going  on  in  terms  of  what  his  chances  are  of 
accidentally  hitting  the^  truth,  if  he  is  being  careless.  That  may  Just 
seem  much  more  speculative  to  me  and  I  may  not  feel  like  I  can  make  a 
convincing  comparison  of  that  part  of  the  situation  to  the  picture  of 
chance. 


That  is  why  I  night  feel  that  it  would  Just  weaken  ny  argument  to 
have  to  nake  a  Judgment  there.  These  are  the  basic  kinds  of  attitudes 
that  I  think  you  have,  to  have  found  an  interest  in  belief  functions, 
because  it  is  the  case  that  you  can  always  nake,  given  a  belief  function 
argument.  You  can  always  nake  a  Bayesian  argument  that  will  take  into 
account  what  you  are  doing,  as  a  belief  function  argument,  and  it  will 
also  take  more  things  into  account,  which  will  undeniably  be  a  better 
argument  if  you  could  get  all  of  the  pieces  to  nake  it  go. 

So  I  think  the  only  reason  for  being  interested  in  the  belief 
function  argument  is  hearing  that  somehow  these  incomplete  models  as 
arguments  may  be  of  some  interest.  You  Just  don't  have  the  strength  of 
evidence  to  make  the  more  beautiful  construction  in  the  sense  that  there 
is  any  question  that  the  Bayesian  logic  (if  you  have  all  of  the  pieces 

to  put  it  together.  Just  on  that  side)  is  much  more;  because  of  its 

greater  completeness,  it  has  nuch  more  convincing  logic  to  it. 

Now  I  cone  to  Ben  Wise's  thought.  Wise  said,  well,  if  we  could 

see  what  Bayesian  Judgments  are  needed  to  get  to  the  answer  of 

eight-tenths,  we  will  better  understand  the  belief  function  analysis.  I 
am  going  to  resist  that.  Because  I  mean  what  would  we  need  to  get  to 
the  eight-tenths?  I  think  what  we  would  need  would  be  prior 
probabilities  of  a  half  each  for  whether  it  was  icy  outside  and  the 
probability  --  I  was  mentioning  these  Judgments.  To  get  a  Bayesian 
analysis,  1  would  need  a  prior  probability  and  a  probability  that  Fred 
would  be  accurate  if  he  were  being  careless.  I  think  the  prior 
probabilities  are  half  and  half  and  if  the  probability  he  is  being 
accurate  if  he  is  careless  is  zero,  then  I  have  the  eight-tenths. 

1  Just  want  to  resist  the  idea  that  when  I  Just  take  Fred's 
reliability  as  my  reason  for  believing  that  it  is  icy  outside  that  I  am 
doing  this.  I  don't  think  I  am.  I  am  not  making  these  Judgments.  1  am 
not  making  this  more  complete  probability  model.  1  am  Just  depending  on 
an  argument  that  says  that  if  something  happens  80  percent  of  the  time, 
and  this  is  an  example  of  that,  that  that  represents  a  relatively  strong 
argument  for  it.  It  la  only  an  argument. 

There  is  a  feeling  that  if  we  do  a  Bayesian  analysis  we  have  isore 
than  an  argument.  When  I  say,  yes,  this  Just  sounds  like  an  argument; 
it  does  not  sound  like  a  complete  analysis.  But  my  point  is  that  the 
Bayesian  analysis  always  gives  you  an  argument  too.  So  what  are  my 
reasons  for  trying  to  resist  this  suggestion?  Because  the  logic  of  the 
belief  function  and  the  Bayesian  analysis  are  different,  so  1  don't  want 
to  Interpret  them,  the  belief  function  analysis  as  a  Bayesian  analysis. 

In  trying  to  explain  why  the  logic  is  different,  1  want  to  say 
things  like  the  two  arguments  make  different  comparisons  to  pictures  of 
chance  and  maybe  I  could^ convey  that  to  you  by  saying  that  they  imbed 
the  problem  in  different  sequences  of  problems.  I  mean  the  pull  of 
probability  always  has  this  idea  that  you  can  imagine  that  this  is  — 
you  can  always  go  from  the  subjective  to  a  more  objective  picture  where 
you  really  did  not  have  a  repetitive  situation. 


But  the  belief  function  argument  in  this  simple  case  is  obviously 
Just  looking  at  the  repetitive  situation  of  different  things  Fred  says 
to  me  when  he  comes  over  to  my  desk  with  this  funny  look  on  his  face. 

The  Bayesian  analysis  is  looking  at  an  imaginary  sequence  of 
repetitions,  is  a  much  more  precise  story  corresponding  to  what  is 
happening  right  now. 

So  there  is  a  different  sort  of  sequence  of  problems  in  which 
this  thing  is  being  imbedded  in  and  somehow  the  fact  that  the  two 
analyses  agree  on  a  particular  answer  at  this  time  should  not  be  given 
that  much  weight. 

I  don't  know,  this  is  changing  horses  a  little  bit  in  midstream, 
because  it  is  looking  at  a  different  example.  Let  me  throw  this  out, 
since  it  seems  to  be  an  example  that  is  easy  to  follow.  I  have  used  it 
in  several  papers.  George  Hooper  was  the  Bishop  of  Bath  and  Wells  in 
the  1680s.  I  will  tell  you  a  little  about  George  Hooper  really.  There 
was  a  paper  in  the  Transactions  of  the  Royal  Society  published  in  1699 
on  probability,  the  authorship  of  which  was  unknown  to  the  probability 
community.  Part  of  the  history  of  probability  called  it  anonymous 
papers  and  speculated  that  John  Craig  might  have  written  it,  et  cetera, 
et  cetera,  and  people  repeated  that  for  many  years. 

It  turns  out  that  all  of  the  time  the  probabilists  did  not  know 
who  wrote  this  paper  and  the  theologians  knew  perfectly  well.  In  fact, 
they  had  republished  the  paper  in  Bishop  Hooper's  collected  works,  which 
were  published  in  the  1700s  and  again  in  1855.  It  is  an  interesting 
case  of  what  la  known  in  the  sense  of  whether  something  is  known  or  not. 
It  was  known  to  some  people  and  it  was  not  known  to  other  people. 

As  far  as  I  know  Hooper  is  the  first  person  to  refer  to  a  number 
between  zero  and  one  as  a  probability,  to  use  that  name  for  the  numbers 
between  zero  and  one.  Now,  that  may  not  be  right,  but  1  can't  name  to 
you  anyone  who  did  it  earlier  and  Hooper  did  it  in  1685.  He  was  a 
chaplain  to  King  James  II,  I  believe.  Since  King  James  II  was  a  not  so 
secret  Catholic,  being  his  Protestant  chaplain  carried  political 
responsibilities,  and  one  of  the  things  he  did  in  the  course  of  that 
responsibility  was  provide  a  tract  against  the  infallibility  of  the  Pope 
and  that  is  where  he  first  published  this  argument  that  he  was 
interested  in. 

So  it  was  a  question  of  combination  of  witnesses.  And  this  is 
really  the  other  point  of  this,  that  it  proves  that  the  belief  function 
is  older  than  Bayes,  because  Bayes  did  not  write  his  essay  until  -- 
well,  it  was  published  posthumously  in  the  1760s  and  he  was  not  born 
until  the  1720s.  So  here  we  have  these  two  witnesses  who  have  their 
credibilities.  P^  is  the  probability  that  the  first  witness  is  going  to 
be  truthful  as  op^sed  to  being  careless.  Po  ia  the  probability  the 
second  one  is  going  to  be  truthful  as  opposed  to  careless.  So,  we  have 
these  two  Independent  witnesses  and  they  tell  us  they  both  agree  on 
something  they  tell  us.  What  probability  does  their  concurrence  give  to 
the  conclusion? 


So  Dempster's  rule,  or  In  this  esse  Hooper's  rule,  the  same  thing 
in  this  case,  says  that  the  answer  is  one  minus,  one  minus  P,  one  times 
one  minus  P2  and  the  reasoning  behind  that  is  easy  to  understand.  This 
is  the  probability  that  the  first  one  will  be  careless,  this  is  the 
probability  that  the  second  one  will  be  careless.  If  they  agree  the 
only  way  they  can  be  wrong  is  if  both  were  careless.  This  is  the 
probability  of  that  happening  and  this  is  the  probability  of  that  not 
happening. 

So  in  general,  this  is  the  probability  that  at  least  one  of  them 
would  be  truthful.  Well,  now  you  have,  that  is  working  on  one 
probability  frame  and  then  this  sort  of  general  rule  I  talk  about 
transferring  probabilities  from  one  frame  to  another  frame  when  you  have 
a  compatibility  relation,  seeing  what  their  testimony  is,  seeing  the 
fact  that  they  agree  on  saying  that  it  is  icy  outside.  Seeing  that 
creates  a  compatibility  relation,  that  says  if  at  least  one  of  them  was 
being  truthful,  then  it  is  true  that  it  is  icy  outside.  So  this  would 
be  a  valid  belief  function  argument. 

For  example,  if  you  gave  each  witness  separately  only  a 
credibility  of  three-fourths,  together  their  testimony  would  carry 
weight  15/16,  .9375.  Okay,  so  we  could  give  a  Bayesian  analysis  of  that 
story.  We  could  say,  well,  this  is  not  right,  because  it  does  not  have 
the  prior  probabilities  and  everything  in  it.  One  way  of  explaining  why 
it  is  not  right  is  it  does  not  take  into  account  the  fact  that  these  two 
guys  agree. 

Let's  suppose  we  made  a  Bayesian  model.  When  we  make  a  Bayesian 
model,  we  have  to  put  in  some  additional  Judgments.  We  have  got  to 
decide  in  some  way  what  the  prior  probabilities  are  and  also  what  the 
probability  of  them  being  accurate  is  if  he  is  being  careless.  Let's 
suppose  for  argument  that  if  he  is  careless,  when  they  are  careless  they 
are  always  wrong.  In  other  words,  if  they  are  careless  they  lie  to  us 
somehow . 

Well,  in  that  case,  you  can  calculate  the  prior  probability  by 
Bayesian  principles  and  it  comes  out  not  15/16  but  9/10.  So  it  is  a 
different  answer.  But  again,  I  make  the  same  point.  If  you  could  make 
these  additional  Judgments,  if  you  can  construct  a  convincing  argument 
that  says  you  have  evidence  for  this  kind  of  a  Judgment  for  the 
probability  that  this  guy  is  going  to  be  accurate  if  he  is  being 
careless,  then  this  is  a  better  argument  than  the  belief  function 
argument. 

But  if  you  can't,  then  the  belief  function  argument  may  be  the 
best  you  can  do.  So  for  Professor  Lindley's  statement  that  the 
probability  is  always  adequate,  I  think  the  answer  has  to  be,  well,  that 
means  that  you  can  always  write  down  a  beautiful  Bayesian  analysis, 
which  would  show  what  we*  would  like  to  do  in  the  sense  that  we  would 
like  to  have  those  inputs  to  make  that  argument  convincing.  But 
sometimes  we  don't  have  the  evidence  needed  to  support  those  Judgments. 


So  there  is  my  argument. 

DR.  DeCROOT:  Thank  you.  Lotf“i,  do  you  want  to  make  some 
comments  please? 

DR.  ZADEH:  I  would  like  to  focus  my  comments  on  one  recurring 
issue  that  has  been  heard  here  and  that  is  the  issue  of  adequacy  of 
probability  theory  and  an -issue  that  was  very  forcefully  argued  by 
Professor  Lindley.  It  seems  to  me  that  there  are  really  three  points 
that  can  be  raised  in  that  connection. 

Professor  Lindley  maintains  in  effect  that  numerical  probability 
theory  is  adequate,  the  theory  in  which  probabilities  are  treated  as 
numbers.  I  think  that  Professors  Dempster  and  Shafer  took  issue  with 
that  and  said,  no,  we  have  to  go  beyond  that,  that  we  have  to  admit 
Interval  data  probabilities  and  then  they  said,  well,  probably  that  is  a 
good  place  to  start,  although  they  don't  take  as  strong  a  position  on 
that  as  Professor  Lindley  does. 

My  own  position  is  this,  that  one  in  many  cases  has  to  go  beyond 
that,  in  other  words,  beyond  interval  data  probabilities.  It  is  not 
that  one  should  not  use  probability  theory,  but  what  kind  of 
probabilities  one  should  be  allowed  to  use. 

What  1  am  saying,  at  least  in  part,  is  that  one  should  be  allowed 
to  use  linguistic  probabilities,  which  are  basically  imprecise 
characterization  probabilities.  Now,  the  classical  linguistic 
probability  is  a  special  case  of  the  linguistic  variable.  What  is  a 
linguistic  variable?  Well,  here  is  an  example. 

There  is  something  that  admits  to  a  numerical  characterization  in 
this  case,  but  that  need  not  be  the  case,  like  some  sort  of  numerical 
scale,  tall.  So  we  can  describe  it  in  numbers,  but  there  are  many 
situations  in  which  we  either  do  not  know  really  what  the  number  is  or 
we  will  find  it  unnecessary  to  specify  what  that  number  is  exactly.  So 
it  might  be  sufficient,  as  we  frequently  do  in  every  day  discourse,  to 
say  it  is  medium  or  to  say  it  is  low  or  to  say  it  is  very  low.  You  have 
curves  like  that  which  are  generalizations  of  intervals. 

These  generalizations  of  intervals  are  possibility  distributions. 
Here  is  another  slide  which  shows  that  perhaps  more  simply.  Here  we  are 
talking  about  age.  So  you  have  young,  you  have  old,  you  have  not  young, 
you  have  very  young  and  so  forth.  Now  in  this  sort  of  characterization, 
instead  of  using  the  constant  of  a  unit,  which  is  something  like  the 
canonical  examples  that  we  talked  about  here  before,  we  are  using  two 
primary  fuzzy  sets,  young  and  old. 

Now,  once  these  have  been  calibrated  in  a  particular  context, 
then  you  use  these  modifiers  like  very,  rather ,  quite,  somewhat , 
extremely,  more  or  less,  not  very  and  so  forth,  to  generate  other 
values,  and  this  is  how  it  will  work  in  the  case  of  probabilities.  So 
the  primary  terms  are  likely  and  its  antonym,  unlikely.  Then  you  have 
not  likely,  very  likely,  not  very  likely,  more  or  less  likely,  extremely 
likely;  and  on  the  other  side  you  have  the  same  sort  of  thing  with 


unlikely  and  here  you  have  nixtures  of  these,  not  likely  and  not 
unlikely. 

So  what  happens  then  is  this,  once  you  define  or  calibrate  likely 
and  unlikely,  then  from  that  point  on  the  definitions  of  other  terms  can 
be  computed  automatically.  In  other  words,  each  of  these  modifiers  is 
interpreted  as  an  operator.  These  operators  then  act  on  the  primary 
sets  and  generate  other  seta.  This  is  the  basic  idea  behind  linguistic 
variables.  Notice  that  in  this  sort  of  a  thing,  you  can  replace  likely 
by  handsome  and  you  would  have  handsome,  ugly,  not  handsome,  very 
handsome,  not  very  handsome,  more  or  less  handsome,  extremely  handsome 
and  so  forth.  Anything  you  can  think  of  you  can  substitute  in  there  and 
it  would  be  the  same  sort  of  a  thing. 

The  point  I  am  trying  to  make  is  that  in  every  day  discourse  we 
use  this  sort  of  a  thing  all  of  the  time.  We  can  and  we  do  use  the  same 
sort  of  things  with  respect  to  probabilities.  So  what  it  boils  down  to 
really  is  this,  that  instead  of  trying  to  force  people  into  the  use  of 
numerical  probabilities,  you  allow  them  to  use  linguistic  probabilities 
with  the  understanding,  however,  that  these  linguistic  probabilities  are 
labels  for  fuzzy  sets. 

So  they  are  not  treated  like  some  labels  that  you  can't  really  go 
inside.  You  can  go  inside  of  these  things.  You  can  make  use  of  the 
more  detailed  structure  of  these  things  and  you  have  a  system  for 
generating  complex  values  out  of  simple  values.  In  effect,  you  have  a 
language.  This  is  what  we  call  the  language  with  a  semantic  structure 
in  the  sense  that  given  the,  it  has  a  syntax,  and  given  the  syntax  tree 
for  any  one  of  these  labels,  we  can  find,  we  can  compute  then  the 
meaning  of  the  label. 

So  the  point  I  am  trying  to  make  now  here  with  the  simple 
examples,  if  this  is  likely,  unlikely  is  the  mirror  image  of  it,  the 
antonym.  Not  unlikely  is  one  minus  that.  More  or  less  likely  is 
interpreted  as  the  square  root  of  the  number  likely.  Very  likely  is 
interpreted  as  a  square  and  so  forth.  Whether  it  is  square  root  or 
square  is  not  important. 

The  essential  point  here  is  that  it  is  some  sort  of  an  operator 
which  modifies  the  possibility  of  distribution.  So  if  you  consider 
problems  of  the  order  of  complexity  of  what  David  presented  here  this 
morning,  then  it  seems  to  me  it  is  necessary  to  both  the  patient  and  the 
doctor  or  diagnostician  or  clinician,  whatever  it  is,  to  make  use  of 
characterizations  of  this  kind  in  situations  in  which  the  use  of 
numerical  probabilities  cannot  be  Justified  on  the  basis  of  the 
information  that  it  will. 

Now  the  same  sort  of  a  thing  applies  to  the  numerical  context  in 
diseases.  What  do  you  mean  by  arthritis?  You  cannot  really  come  up 
with  simple  definitions  of  complex  diseases.  The  same  thing  applies  to, 
for  example,  recession.  What  do  we  mean  by  recession?  You  see  at  this 
point  what  people  try  to  do.  It  is  extremely  simplistic,  like  if  the 
gross  national  product  decreases  in  two  successive  quarters,  then  we  are 
in  recession.  But  that  does  not  capture  the  complexity  of  the  concept. 


^  . 


250  - 


So  what  we  have  to  do  then,  is  we  have  to  consider  various 

constituents  of  that  concept,  like  GNP,  unemployment,  increase  in 
bankruptcy  and  so  forth  and  then  have  a  table  which  tells  us  that  the 
decline  in  GNP  is  little  and  unemployment  is  low  and  bankruptcy  is  high, 
then  the  degree  to  which  we  are  in  recession  is  not  true  and  so  forth. 

What  I  am  perhaps  harping  on  is  the  idea  that  in  dealing  with 
complex  issues,  we  are  using  at  this  point  inadequate  tools.  We  are 
simply  not  matching  the  complexity  of  these  concepts  with  a  system  that 
allows  us  to  capture  some  of  this  complexity.  So  this  was  my  point. 

DR.  COHEN:  Are  you  advocating  the  use  of  a  table  like  that? 

DR.  ZADEH:  Yes. 

DR.  COHEN:  Doesn't  It  take  the  place  of  the  use  of  the  fuzzy  set 
of  recession?  You  can  simply  treat  the  values  under  GNP,  unemployment 
and  bankruptcy  as  evidence  in  the  Bayesian  update. 

DR.  ZADEH:  This  Is  a  different  issue. 

DR.  COHEN:  I  would  think  the  use  of  the  fuzzy  set  would  be  where 
you  don't  want,  it  is  too  complex  to  create  a  table  like  that  and  so  you 
simply  ask  lor  suggestions,  the  degree  of  recession. 

DR.  ZADEH:  Of  course,  you  know  in  this  presentation.  I  cannot 
go  into  the  details.  I  am  speaking  of  many  things.  But  roughly  what  is 
Involved  is,  for  example,  in  the  case  of  unemployment,  low  means  this; 
more  or  less  high  means  this  and  so  forth,  and  then  there  is  a  formula. 
There  is  a  formula  which  takes  this  kind  of  a  table  and  it  is  called  a 
translation  formula,  and  translates  that  into  a  relation  which  is 
defined  no  longer  on  these  labels. 

And  from  that  point  on  you  can  Interpolate  this  table,  so  that  if 
you  have  an  entry  like  this  is  slightly  over  little  and  this  is  more  or 
less  low,  something  that  is  not  in  the  table,  then  you  can  find  the 
degree  to  which  you  are  in  recession.  So  basically  it  is  a  matter  of  it 
allows  you  to  Interpolate.  You  cannot  interpolate  if  you  stick  with 
Just  labels.  Just  as  labels.  That  is  idiat  happens. 

The  same  sort  of  thing  happens  in  the  case  that  David  presented 
this  morning.  If  you  treat  these  labels  as  simple  anatomical  things, 
you  cannot  interpolate.  You  need  many,  many  more  rules.  If  you  have 
the  capability  for  interpolation,  you  can  get  by  with  fewer  rules. 
Otherwise,  you  have  to  make  a  provision  for  every  eventuality  and  that 
is  impossible. 

Let  me  then  say  j\ist  very  briefly  something  about  belief  and 
plausibility  and  this  is  a  point  that  I  mentioned  yesterday.  I  do  have 
some  objection  to  the  use  of  the  word  belief  and  my  objection  is  the 
following,  that  basically  then  because  of  the  incompleteness  of  our 
information,  we  cannot  put  in  a  probability  that  has  interval  value.  We 
have  the  lower  bound  and  we  have  the  upper  bound,  but  attaching  the  name 


belief  to  the  lower  bound,  we  tend  to  lose  sight  of  the  fact  that  this 
is  simply  the  lower  bound,  that  we  are  dealing  in  fact  with  an  interval 
value  probability  and  the  user  is  aisled  into  believing  that  this  is  all 
that  the  user  needs,  because  the  user  is  told  that  the  belief  is  .3  and 
the  user  says  that  is  enough.  All  1  am  interested  in  is  the  degree  of 
belief.  The  user  is  not  told  that  this  is  simply  an  interval  of  which 
that  is  the  lower  bound. 

So  the  plausibility  is  somehow,  although  it  is  present  in  the 
theory,  but  somehow  it  is  paid  much  less  attention  to  and  they  tend  to 
focus  their  attention  only  on  one  bound. 

OR.  SHAFER:  I  know  I  had  ay  turn,  but  1  do  want  to  say  that  I 
don't  regard  the  belief  function  thing  as  giving  interval  probabilities. 
Like  Fred's  testimony  gives  me  80  percent  reason  to  believe  it  is  icy 
outside  and  zero  to  believe  it  is  not,  I  don't  regard  that  as  bound  at 
80  to  100.  I  Just  regard  it  as  80  on  one  side  and  zero  on  the  other. 

A  bound  implies  that  further  evidence  might  give  something  more 
definite  between  80  and  100.  Further  evidence  might  give  something  less 
than  80. 

DR.  DEMPSTER:  That  is  my  vote  too. 

DR.  ZADEH:  I  realize  that,  but  unfortunately  I  don't  have  the 
time  to  go  into  it.  But  I  would  be  prepared  to  argue  this  point  that  we 
are  dealing  with  interval  value  probabilities.  That  is  all  we  are 
dealing  with. 

Well,  I  will  stop  at  this  point.  Thank  you. 

DR.  DeGROOT:  Thank  you. 

DR.  LINDLEY:  One  thing  that  has  surprised  me  from  this 
conference  is  the  support  that  everybody  has  given  to  probability.  From 
the  readings  of  writers  on  belief  functions  and  fuzzy  logic,  1  had  not 
gotten  the  impression  that  probability  played  a  very  Important  role,  but 
it  is  clear,  I  think,  for  me  at  any  rate,  from  the  discussion  that  it 
does  play  an  important  role  even  in  those  approaches.  And  Professor 
Zadeh  said  yesterday  some  encouraging  words  about  how  he  would  use 
probability  and  Glenn  Shafer  used  encouraging  words  about  how  he  would 
use  probability  if  he  had  enough  information  to  do  so. 

So  I  feel  there  is  a  fairly  solid  base  for  using  probability  and 
what  we  are  really  discussing  is  whether  it  is  going  to  work  all  of  the 
time.  If  I  understand  the  other  people  correctly,  they  are  saying  there 
are  situations  in  which  it  would  be  nice  to  use  the  probability 
argument,  but  circumstances  prevent  it.  That  seems  to  me  to  be  quite  a 
bit  of  progress.  It  is  for  me. 


-  252  - 


L«t  Be  now  look  at  these  Inadequacies.  A  reaark  of  Art  Dempster 
yesterday  very  auch  puzzled  oe.  He  said,  1  as  afraid  It  surprised  Be  so 
much  1  did  not  write  down  his  exact  words  at  the  time,  but  he  said 
something  to  the  effect  that  belief  functions  were  Just  a  generalization 
of  probability,  that  probability  was  known  about  all  of  the  time  and 
that  It  was  an  addition  to  them. 

This  does  not  seem  to  me  to  be  right,  because  belief  functions 
combine  according  to  his  own  rules,  not  according  to  Bayes  rules  and  the 
Great  Scorer  In  the  heavens  above  us  will  score  hia  rather  badly  when  he 
does  this.  The  rules  are  different  and  that  seems  to  me  to  be  very 
Important. 

Something  else  too  that  Glenn  Shafer  said  I  take  exception  to. 
and  I  did  take  this  one  down.  He  said  you  can't  take  things  out  of  a 
Bayesian  argument  and  expect  it  to  work.  This  was  In  connection  with 
modularity.  You  can't  take  things  out  of  a  Bayesian  argument  and  expect 
It  to  work.  Now  that  seems  to  me  to  be  very  surprising,  because  that  Is 
one  of  the  great  strengths  of  the  Bayesian  argument,  that  you  can  Indeed 
Just  do  that. 

Any  of  you  who  have  done  any  statistics  know  that  one  of  the 
great  strengths  of  the  Bayesian  argument  is  that  you  can  remove  nuisance 
parameters  without  any  difficulty  at  all.  You  Just  integrate  them  out 
and  this  Is  one  of  the  great  arguments.  So  if  you  don't  want  that  thing 
or  if  you  don't  want  that  parameter,  fine,  you  Just  integrate  It  out. 

And  so  this  argument,  this  statement  did  seem  to  me  to  be 
unsatisfactory. 

Another  feature  too  which  does  seem  to  me  to  be  a  little 
confusing  is  the  role  of  frequency  in  the  belief  function  argument.  For 
example,  today  he  talked  about  imbedding  in  a  sequence  of  problems.  The 
Bayesian  argument  has  nothing  whatsoever  to  do  with  imbedding  in  a 
sequence  of  problems.  It  is  a  one  off  Judgment  using  the  information 
that  you  have.  The  information  may  refer  to  some  other  problems,  but  it 
has  nothing  to  do  with  it  and  this  is  the  great  dispute  between  the 
classical  statisticians  and  the  Bayesians. 

Returning  now  to  Professor  Zadeh.  he  made  one  statement 
yesterday,  that  Bayesians  assume  that  we  can  make  up  for  incompleteness 
by  subjective  probability.  I  hope  that  you  understarid  that  that  is 
incorrect.  I  at  any  rate  do  not  assume  this.  Certainly  to  make  that 
assumption  seems  to  me  to  be  gross.  In  fact,  I  might  add  a  little  bit 
of  personal  history, 

I  was  taught  statistics  at  Cambridge  by  what  I  think  of  as  the 
leading  Bayesian.  Harold  Jeffreys,  and  I  Just  did  not  go  along  with 
Harold  Jeffreys,  because*  he  asked  me  to  believe  that  you  would  combine 
your  Judgments  by  the  laws  of  probability  and  that  frankly  was  too  auch 
for  me  to  swallow  and  that  was  the  reason  I  did  not  accept  his  argument. 


Then  along  comes  Savage  and  then  1  learned  Ramsey  had  done  this 
eefore  and  De  Finetti  and  there  is  a  convincing  argument  to  it,  so  one 
does  not  assume  these  things.  One  in  fact  proves  them.  In  fact,  if  you 
read  Harold  Jeffreys  very  carefully  and  a  little  bit  charitably,  I 
admit,  you  can  see  that  he  is  in  fact  edging  towardc  a  proof  that  he 
really  was  not  making  this  assumption  in  his  stuff. 

There  was  another  statement  made  by  Professor  Zadeh,  this  one  was 
on  the  screen,  so  I  am  sure  I  have  got  it  right,  what  matters  in 
decision  analysis  is  usual  rather  than  the  expectation.  I  have  been 
doing  some  work  in  connection  with  nuclear  power  stations  and  the  usual 
thing  with  nuclear  power  stations  is  that  they  work  and  they  produce 
electricity  99.9  percent  of  the  time.  But  what  really  matters  is  what 
is  going  to  happen  on  that  very,  very  unusual  circumstance  when  they 
don't  work.  In  fact,  all  of  the  research  goes  into  that  activity.  There 
is  an  example  that  that  statement  is  certainly  not  correct. 

He  made  some  remarks,  you  may  remember  too,  about  car  insurance, 
that  you  could  not  evaluate  the  probability  that  your  car  will  be 
stolen.  Now,  this  actually,  this  sort  of  thing  happens  quite  a  lot  of 
the  time.  I  have  done  quite  a  bit  of  consulting  for  an  insurance 
company.  If  you  are,  and  I  think  it  is  quite  realistic,  to  be  in  the 
situation  where  you  have  difficulty  in  evaluating  the  probability  that 
your  car  will  be  stolen,  think  about  how  much  insurance  you  are  prepared 
to  pay.  Because  one  can  work  out  from  the  amount  of  insurance  you  are 
prepared  to  pay  what  the  probability  is.  You  don't  have  to  ask  people 
probability  questions  in  order  to  find  out  what  that  probability  is  and 
observe,  I  don't  know  what  the  law  is  in  the  United  States,  it  varies 
from  state  to  state,  if  taking  out  car  insurance  is  voluntary,  people  do 
decide  whether  to  do  it  or  not  and  they  are  tacitly  making  an  assumption 
about  that  probability. 

I  was  delighted,  of  course,  with  David  Spiegelhalter 's  talk.  But 
even  he  felt  that  probabil  ty  was  not  quite  adequate  all  of  the  time 
and,  again,  I  am  afraid  I  don't  agree.  For  example  he  talked  about  the 
likelihood  of  the  union  of  and  A2  given  X  and  suggested  that  it  was 
the  maximum  of  the  likelihood.  This  is  not  right.  You  cannot  infer 
what  the  likelihood  of  union  A2  is  entirely  in  terms  of  the  separate 
likelihoods. 

This  was  a  point  made  by  Fisher  when  he  introduced  likelihood. 
There  is  no  formula  involving  maximum  or  anything  else  that  will  do  the 
job  for  you. 

He  talked  about  Idiot  Bayes  and  I  agree  with  him  it  is  a  bit 
idiot.  What,  of  course,  we  want  to  do  is  pukka* Bayes,  but  if  we  can't 
do  pukka  Bayes  then  we  can  do  kuccha* Bayes. 


*Pukka  in  Hindi  means  ripe;  kuccha  means  raw. 


L*.“  V*! 

kCvS-*! 


•.  O  •  *L 

***V*'«"'J 

U  ^  1  •  ' 


••‘.‘■‘A' 


-  254  - 


One  thing  that  nobody  has  addressed  at  this  conference  is  the 
problem  of  decision  making.  I  still  do  not  know  how  the  fuzzy  folk  or 
the  believers  make  up  their  minds  as  to  that. 

I  made  a  challenge  that  anything  that  could  be  done  by  these 
other  methods  could  be  done  by  probability.  So  I  am  Just  trying  to  beat 
the  Bishop  of  Welles  and  Bath.  I  haven’t  much  time  to  do  it.  Here  is 

the  Bishop  of  Bath  and  Welles  and  the  Bayesians  and  we  want  to  know 

whether  event  A  is  true  or  not,  so  we  want  to  calculate  the  probability 
of  event  A.  The  evidence  we  have  is  a^  and  ^2'  Witness  one  said  it  is 
so  and  witness  two  said  so.  But  to  repeat,  and  are  the  pieces  of 
evidence  from  witness  one  and  witness  two  and  A  is  the  event. 

Witness  one  said  it  was  true  and  witness  two  said  it  was  true. 

So  the  Bayes  argument  begins  by  saying  what  is  it  you  don't  know?  We 
don't  know  whether  the  event  was  true  or  not.  What  do  we  know?  We  know 
a and  a 2.  So  therefore  on  the  left-hand  side  the  thing  I  want  to 
calculate  is  the  odds  on  event  A,  given  a^^  and  a2,  and  on  the  right-hand 

aide,  I  have  put  it  down  in  Bayes  form. 

Now,  I  have  to  do  some  calculations  and  the  first  thing  that  1 
realize  is  that  there  is  nothing  in  the  data  that  Glenn  Shafer  put  on 
the  screen  to  enable  me  to  go  any  further.  Because  he  did  not  tell  me 
anything  about  the  probability  that  these  two  witnesses  would  separately 
state  a^  and  Perhaps  he  meant  they  were  independent. 

If  they  were  independent,  then  I  could  do  the  following,  provided 
I  recognized  that  they  were  Independent  given  the  event  is  true  and  also 
given  the  event  it  was  false.  This  reminds  us  that  whenever  we  are 
considering  evidence,  we  have  to  consider  the  evidence  on  the 
supposition  of  guilt  and  on  the  supposition  of  innocence.  And  it  is 
extraordinary  to  me  that  a  lot  of  the  writing  about  witnesses  carrying 
on  from  the  Bishop  of  Bath  and  Welles  failed  to  recognize  that.  They 
talk  about  the  reliability  of  witnesses,  as  though  it  were  one  number. 

It  is  not. 

The  witness's  reliability  is  two  numbers  -  the  probability  that 
he  would  say  this  when  it  is  true,  and  the  probability  that  he  will  say 
the  same  thing  when  it  is  false.  Both  of  those  things  are  relevant  and 
there  are  circumstances  in  which  they  can  be  entirely  different.  That 
is,  the  person  could  be  extremely  reliable  when  the  event  is  true  and 
extremely  unreliable  when  the  event  is  false. 

So,  therefore,  the  fact  that  the  Bayes  handle  has  produced  this 
result  seems  to  me  to  be  one  up  for  Bayes.  Now,  let's  assume  they  are 
independent  and  if  I  do  assume  they  are  equally  reliable  on  a  not  A,  I 
get  that  result  and  that  is  the  result  that  Shafer  put  on  the  board.  So 
we  now  see  that  in  order  to  get  this  result  we  have  had  to  make  two 
very  Important  assumptlofis.  The  first  assumption  is  Independence  and 
the  second  assumption  is  equal  reliability. 


DR.  WISE:  If  you  assume  p.  and  P2  are  nine/tenths.  doesn't  it 
give  a  ratio  of  8O-8I  and  a  probaBility  of  81/82  instead  of  what  you 
got? 


DR.  LINDLEY:  I  don't  think  it  does. 

DR.  DeGROOT:  Let's  continue  on.  I  think  you  can  worry  about 
that  later. 

DR.  LINDLEY:  But  I  had  to  make  two  assumptions  here.  What  I  say 
to  you  is  this,  don't  you  think  that  those  two  things  are  relevant? 

Don't  you  think  it  is  relevant  to  think  whether  those  two  witnesses  are 
independent  or  not?  An  argument  that  does  not  take  account  of  that,  do 
you  feel  happy  with  it?  Do  you  really  feel  happy  with  not  having  to 
bother  with  whether  those  witnesses  are  independent?  Do  you  really  feel 
happy  with  not  having  to  bother  whether  the  event  was  true  or  false?  Do 
you  really  feel  not  happy  about  not  having  to  put  P(A)  in? 

Suppose  you  knew  A  was  almost  certainly  true,  would  you  really 
want  to  discount  it?  Do  you  really  feel  happy?  You  see  there  are  only 
two  possibilities.  Either  the  argument  that  Glenn  produces  agrees  with 
the  Bayesian  argument  or  it  does  not.  If  it  does  not  agree  with  the 
Bayesian  argument  then  the  great  scorer  will  have  it. 

If  it  does  agree  with  the  Bayesian  argument,  then  he  is  making 
some  assumptions  somewhere  and  I  ask  you  are  those  assumptions 
reasonable?  Now,  I  would  not  guarantee  that  this  piece  of  calculation 
is  correct.  I  am  not  very  good  at  doing  calculations  quickly  and  I  was 
trying  to  listen  to  Professor  Zadeh  at  the  same  time,  but  it  did  appear 
to  me  when  the  calculations  are  done  that  Dempster  and  this  independent 
Bayes  will  agree  if  this  holds.  That  is  a  very  curious  statement.  If 
the  probability  event  is  not  true  is  equal  to  the  probability  that  the 
witnesses  will  say  a^  and  a^. 

This  has  nothing  to  do  with  these  probabilities  up  here.  This  is 
the  probability  that  they  will  say  that  a.  and  a^  unconditionally.  So 
if  you  are  a  Dempster,  you  are  making  that  sort  of  statement.  Is  that 
really  reasonable?  Do  you  really  believe  that? 

What  I  am  saying  to  you  is  that  if  you  do  the  probability 
argument  in  full,  turning  the  mechanical  handle  of  the  calculation,  it 
will  show  you  there  are  certain  things  you  have  got  to  think  about. 

Think  about  them.  They  are  in  this  case,  and  I  think  you  will  always 
find,  that  they  are  relevant  and  an  argument  that  does  not  use  them,  it 
seems  to  me,  is  very  suspicious. 

Thank  you. 

DR.  DeGROOT:  Thapk  you.  David,  do  you  want  to  take  a  couple  of 
minutes? 


-  256  - 


DR  *^(’TEGELHALTER:  I  want  to  talk  purely  on  a  practical  level 
rather  than  arguing  about  the  theory  of  any  of  these  approaches.  First, 
on  the  fuzziness,  there  are  two  levels  In  which  Professor  Zadeh  has  been 
saying  fuzziness  can  be  used.  The  top  level  of  fuzziness  has  to  do  with 
whether  the  propositions  themselves  are  ill-defined  or  not.  I  have  a 
couple  of  pictures  that  I  will  use,  that  I  stole  from  other  people,  to 
illustrate  the  difference  between  a  well-defined  proposition  with  sort 
of  probability  attached  to  it  and  an  ill-defined  proposition  which  has  a 
degree  of  truth  attached  to  it. 

This  one,  is  it  fish  or  fowl?  It  sor^  of  has  fish  on  some  faces 
and  fowl  on  the  other.  They  have  a  degree  of  fish  and  fowl. 

Also  another  example  I  used  for  a  proposition  is  that  I  can  read 
the  next  overhead.  Well,  this  is  only  partly  true.  I  know  that  is  a 
fuzzy  something  or  other.  I  have  no  idea  what  it  is. 

DR.  DeGROOT:  Does  that  mean  that  fuzzy  is  untranslatable? 

DR.  SPIEGELHALTER:  My  argument  was  that  it  is  both 
unsatisfactory  to  have  fuzzy  propositions  and  it  can  be  avoided  in 
general  by  identifying  the  propositions  used  by  crispifying  them  in 
terms  of  the  actual  interaction  that  is  on  house  with  the  system. 

The  second  order  of  fuzziness  is  when  we  start  saying  we  are 
going  to  use  probabilities  but  we  are  not  quite  sure  what  they  are.  And 
should  we  in  fact  say  it  is  low,  around  about  .2  or  something  like  that? 
Again,  I  would  say  from  the  examples  used  this  morning  that  when  we  do 
ask  people  probabilities  and  they  don’t  know  what  they  are,  if  you  sit 
them  down  and  talk  to  them  hard  enough,  they  will  give  you  some  idea  of 
ranges  and  draw  curves  and  they  will  do  it. 

And  then  there  are  data  probabilities.  You  might  not  even  feel 
too  happy  about  the  curves  they  have  drawn,  but  at  least  it  tells  you 
that,  how  on  the  input  you  need  more  Information,  how  you  can  update 
those  probabilities  and  you  can  learn  about  those  probabilities. 

So  both  in  the  top  level  and  on  the  second  level,  I  feel  that 
fuzziness  can  be  avoided  and  I  would  like  to  avoid  it. 

Then  as  to  the  argument  on  the  practical  thing  about  the  belief 
functions,  I  am  not  going  to  argue  about  the  theory  and  Justification 
for  it.  What  I  went  over  this  morning,  what  I  wanted  to  argue,  was  that 
there  are  certain  behavioral  demands  called  belief  functions  from  people 
designing  expert  systems.  Specifically,  it  is  because  they  want  to  work 
with  hierarchies  of  hypothesis  structures  and  a  hierarchy  of  taxonomy. 
They  want  to  deal  with  ignorance  and  one  will  say,  well,  we  Just  don't 
know  anything  about  the  lower  levels  of  this  hierarchy  and  because 
belief  functions  provide,  the  method  of  identifying  sources  of  evidence 
explicitly  and  so  you  can  identify  what  contribution  each  source  of 
evidence  is  towards  the  final  conclusion. 


And  all  of  these  seem  to  be  very  reasonable  demands,  but  my  claim 
is  that  they  all  can  be  dealt  with  within  a  probability  calculus.  One 
has  to  put  more  in  because  one  ideally  wants  to  define  a  Joint 
distribution  over  all  possible  propositions.  And  you  are  necessarily 
going  to  have  to  use  all  sorts  of  approximations  and  ways  of  padding  out 
distributions. 

But  essentially  what  I  went  over  this  morning  was  designed  to  say 
that  you  can  cope  with  hierarchies  and  you  can  cope  with  ignorance  and 
in  fact  within  a  closed  expert  system  and  provide  an  operational 
definition  of  ignorance  in  terms  of  possible  beliefs  that  you  may  have 
when  further  information  comes  in.  One  can  Identify  sources  of  evidence 
through  this  rather  crude  but  effective  way  of  showing  how  individual 
events  and  evidence  has  changed  your  beliefs  in  the  past. 

Coming  on  to  Professor  Lindley's  points,  I  have  in  fact  changed 
my  mind,  I  think,  since  this  morning,  since  talking  to  him  and  Ben  Wise. 
My  feeling  is  that  the  limitations  of  probability  are  when  you,  for  some 
reason,  want  to  use  ill-defined  propositions  or  you  want  to  use 
propositions  that  are  not  strictly  verifiable. 

I  had  previously  thought  that  maybe,  in  cases,  for  reasons  of 
control,  you  might  want  to  use  something  that  was  not  strictly 
probabilistic.  If  you  talk  to  people  from  other  areas  and  they  talk 
about,  well,  you  have  got  this  situation  where  you  want  to  decide 
whether  to  trigger  a  particular  set  of  rules,  a  particular  set  of 
possible  hypotheses,  very  much  in  the  INTERNIST  idea;  or  trying  to 
develop  a  differential  diagnosis,  trying  to  structure  your  problem.  In 
doing  that  structuring  perhaps  you  might  want  to  use  ideas  of  relevance 
that  a  particular  symptom  makes  you  want  to  look  at  a  particular 
disease.  That  idea  of  relevance  could  be  related  to  whether  that 
symptom,  in  some  way,  provides  a  description  of  that  disease  that  gives 
some  support  to  that  disease.  It  might  be  in  terms  of  using  a  calculus 
that  supports  what  a  set  of  data  gives  to  a  set  of  possible  hypotheses, 
in  fact  maximum  support  to  any  particular  member  of  the  hypothesis  and 
the  maximization  of  a  likelihood  is  looking  like  the  sort  of  thing  that 
is  done;  something  comparing  two  groups  and  hypotheses  in  likelihood 
ratio  tests,  in  which  one  does  not  maximize  over  the  likelihood. 

I  have  changed  my  mind  since  this  morning.  I  don't  think  that  la 
necessary  and  I  believe  one  should  be  able  to  work  within  the 
probability  calculus  by,  at  any  time  you  are  considering  extending  your 
frame  of  concern  and  considering  new  hypotheses,  that  these  should  be 
brought  in  and  the  probability  of  distribution  should  be  assessed  on 
those  hypotheses  and  a  decision  to  pursue  a  particular  line  of  reasoning 
can  be  based  on  a  probability. 

So  in  that  way,  I  have  become  a  bit  more  extreme  during  this 
discussion.  I  believe  if  you  are  working  with  theoretically,  verifiable 
propositions,  then  one  need  only  consider  the  probability. 


DR.  DeGROOT:  On  that  happy  note,  I  now  reveal  myself  as  a  true 
believer  I  have  sort  of  felt  over  the  last  couple  of  days  like  the 
Barbra  Walters  of  the  theory  of  uncertainty  or  something  and  we  have  now 
come  down  to  the  wire  and  1  think  that  we  should  not  leave  until  we  have 
settled  this  issue.  So  are  you  ready  to  vote? 

(laughter) 

DR.  DeGROOT:  I  want  to  thank  the  speakers  and  all  of  you  for 
participating.  We  adjourn  the  session. 


RETROSPECTIVE  COMMENTS  ON  PAPERS  AND  PRESENTATIONS 


STEPHEN  R.  WATSON 
Cambridge  University 


1 .  Introduction 

These  notes  contain  my  comments  as  a  discussant  at  the  conference 
on  the  Calculus  of  Uncertainty  in  Artificial  Intelligence  and  Expert 
Systems,  which  was  held  at  George  Washington  University  on  27  and  28 
December  198^4.  In  the  next  four  sections  I  give  an  account  of  the 
points  that  I  made  at  the  end  of  the  four  main  talks  at  the  conference, 
by  Professor  Glenn  Shafer,  Professor  Lotfl  Zadeh,  Professor  Dennis 
Lindley  and  Dr.  David  Splegelhalter.  In  section  six  I  present  some 
summary  comments  which  were  not  made  at  the  conference,  but  are  made  now 
as  a  result  of  my  reflection  on  what  was  said  at  the  conference. 

2.  Comments  on  the  contribution  of  Professor  Shafer 


One  of  the  things  that  makes  Shafer's  theory  Interesting,  is  that 
it  can  be  seen  as  an  alternative  to  the  traditional  probability  theory. 
Is  this  really  so,  however?  Firstly,  note  that  one  of  the  strengths  of 
subjective  probability  theory,  is  the  clear  out-nature  of  the  axiomatic 
support  for  the  theory.  Indeed,  as  Professor  Lindley's  contribution 
showed,  it  is  possible  to  claim  that  probability  theory  is  the  only 
theory  one  could  possibly  use  to  represent  uncertainty.  Shafer's  theory 
does  not  as  yet  have  such  a  clear-cut  support.  For  example,  although 
Shafer  recognizes  the  importance  of  canonical  examples,  as  yet  belief 
function  theory  is  not  provided  with  the  same  axiomatic  development  that 
is  available  for  probability  theory. 

It  can  be  claimed,  however,  (see  Dempster's  contribution  at  this 
conference)  that  belief  functions  are  indeed  rooted  in  probability 
theory.  It  is  just  that  the  probability  is  associated  with  a  power  set 
rather  than  a  simple  set.  If  this  interpretation  of  belief  function 
theory  is  accepted,  then  indeed  there  is  no  problem  because  the 
philosophical  support  for  probability  theory  clearly  also  will  support 
belief  function  theory.  However,  Professor  Shafer  seems  in  some  of  his 
writings  not  to  be  very  happy  with  this  interpretation  of  his  theory. 

And  if  he  rejects  this  interpretation  then  the  problem  of  a 
philosophical  foundation  for  belief  function  theory  remains. 

The  second  point  I  make  here  concerns  concepts  of  Independence. 
Professor  Shafer  touched  on  this  point  in  his  talk,  but  it  is  worth 
saying  again  that  concepts  of  Independence  in  belief  function  theory  are 
not  yet  clear.  Firstly,  in  the  application  of  Dempster's  rule  to 
determine  the  support  for  a  hypothesis  on  the  basis  of  two  pieces  of 
evidence,  there  is  a  rather  vague  notion  that  the  two  pieces  of  evidence 
should  be  Independent  in  some  way.  The  detailed  meaning  of  this  concept 
of  independence  is  far  from  clear.  Shafer  recognizes  this  difficulty 
and  in  his  discussion  of  frames  is  attempting  to  overcome  it.  It  is 
sufficient  to  say  at  this  point,  however,  that  we  do  not  yet  know  how  to 
handle  dependence  concepts  in  belief  function  theory  in  a  way  which  is 


intuitively  understandable 


3.  Conanents  on  the  contribution  of  Professor  Zadeh 

Firstly,  note  that  Zadeh  sees  fuzzy  set  theory  as  a  companion  to 
probability  theory,  not  as  a  replacement  for  that  theory.  Thus  he  sees 
some  situations  in  which  the  use  of  the  probability  calculus  is 
appropriate,  but  others  where  it  is  inappropriate.  Indeed  he  se_es  fuzzy 
set  theory  as  a  calculus  for  handling  Imprecise  entities  rather  than 
uncertain  entitles.  Imprecision  is  a  property  which  scientists  have  for 
many  years  been  keen  to  avoid;  yet  one  of  the  main  reasons  for  Zadeh 's 
introduction  of  the  concept  of  the  fuzzy  set,  was  his  belief  that  in 
Systems  Analysis  a  stress  on  precision  was  misleading.  It  is  always 
possible  for  the  probablllst  to  claim  (as  indeed  Professor  Lindley  did 
during  this  conference)  that  in  any  context  imprecision  can  be  modeled 
using  probability  theory.  For  example,  if  you  are  imprecise  in  giving 
me  some  information  I  am  uncertain  about  what  is  actually  the  case.  If 
you  tell  me  that  John  is  tall,  1  am  uncertain  about  his  precise  height. 
It  is  thus  clearly  possible  to  avoid  the  need  to  introduce  fuzzy  sets; 
the  cost  of  doing  so,  however,  is  to  produce  an  enormously  complex 
probabilistic  framework  which  may  well  be  impossible  to  analyze.  (I 
will  return  to  this  point  when  I  discuss  Professor  Lindley's 
contribution).  To  seek,  therefore,  to  handle  imprecise  concepts 
directly,  rather  than  to  introduce  precision  and  accompanying 
uncertainty  seems  to  me  to  be  a  virtuous  aim.  To  the  extent,  therefore, 
that  computations  using  the  fuzzy  set  theory  give  sensible  results,  it 
seemed  a  useful  heuristic  to  follow. 

It  must  be  admitted,  however,  that  problems  exist  in  using  fuzzy 
set  theory.  Perhaps  the  most  obvious  of  these  is  the  origin  of  the 
numbers  that  go  to  make  up  a  membership  function.  As  I  mentioned  in 
section  two  Professor  Shafer  agrees  with  the  probabl lists,  that  one 
needs  canonical  examples  in  order  to  provide  meaning  for  the 
mathematical  constructs  one  uses.  Such  examples  do  not  exist  within 
fuzzy  set  theory.  When  taxed  with  this  question  (as  indeed  he  was  at 
this  conference)  Zadeh  points  out  that  people  seem  to  have  an  Intuitive 
idea  of  how  to  provide  such  numbers,  and  that  if  we  bother  too  much 
about  precisely  what  the  numbers  are,  we  vitiate  the  whole  spirit  of  the 
approach  which  is  concerned  with  the  representation  of  imprecise 
quantities.  But  this  does  not  answer  the  problem  fully.  The  open 
question  in  my  mind  is  how  sensitive  the  outputs  of  fuzzy  analyses  are 
to  the  representation  of  imprecise  concepts  by  membership  functions.  If 
outputs  are  indeed  sensitive  then  the  precise  choice  of  a  membership 
function  is  rather  Important,  and  at  present  there  is  no  guidance  within 
the  literature  on  how  to  choose  one  membership  function  rather  than 
another.  On  the  other  hand,  if  the  answers  are  insensitive  to  these 
representations,  then  one  wonders  if  the  outputs  of  a  fuzzy  analysis  can 
actually  tell  one  anything. 

< 

Then  again,  there  are  the  connectives.  When  Zadeh  introduced 
fuzzy  set  theory  in  the  first  place,  he  suggested  the  max  and  min 
operators  for  the  connectives  'or*  and  'and'  respectively.  There  are, 
however,  a  great  number  of  other  operators  which  could  be  used  to 
represent  these  connectives,  and  have  many  of  the  same  properties  (such 


as  reducing  to  the  traditional  operators  in  the  case  of  crisp  sets).  It 
appears  at  present  that  within  fuzzy  set  theory,  there  is  no  protocol 
for  determining  which  of  these  connectives  to  use;  rather  one  is  advised 
to  use  whichever  seems  sensible  in  any  given  context.  This  emphasizes 
fuzzy  set  theory  as  a  heuristic  approach.  It  must  be  thought  of  a  a 
reasonable  way  to  get  sensible  results  in  a  complex  analysis,,  rather 
than  a  'correct'  approach  following  on  from  believable  and  acceptable 
axioms. 

Finally,  we  should  comment  on  the  blandness  of  fuzzy  set  theory. 
Because  it  deals  with  possibilities  rather  than  probabilities,  it  is  - 
quite  easy  to  create  input  membership  functions  which  are  so  broad  (in 
the  sense  of  allowing  a  large  number  of  possible  values  to  have  nonzero 
possibility  that  the  output  fuzzy  distributions  are  extremely  broad. 

This  is  not  surprising.  The  more  imprecise  Inputs  we  put  into  analysis, 
the  more  imprecise  we  can  expect  the  output  of  the  analysis  to  be.  I  am 
not  sure  if  this  can  be  articulated  into  a  general  principle,  since  one 
can  clearly  construct  examples  where  imprecision  does  not  build  on 
itself  in  this  way.  None  the  less,  it  is  my  impression  that  fuzziness 
can  get  out  of  hand.  In  such  circumstances,  of  course,  the  solution  is 
to  go  back  to  the  beginning  and  be  more  precise,  and  a  probabilistic 
analysis  would  demonstrate  the  need  for  this. 

In  summary  then,  I  see  fuzzy  set  theory  as  a  sensible  heuristic 
way  of  describing  imprecise  concepts,  and  of  breaking  through  the 
complexities  of  other  kinds  of  analysis.  The  fact  that  it  is  a 
heuristic,  however,  means  that  we  can  never  be  certain  that  the  results 
of  the  analysis  make  sense. 


Comments  on  Professor  Llndley's  contribution 


The  conviction  with  which  Professor  Lindley  speaks,  and  the  sheer 
power  of  his  argument  impel  users  of  alternatives  to  probability  theory 
to  respond  to  his  arguments.  If  we  do  not  accept  the  inevitability  of 
probability,  why  not? 


Users  of  Shafer's  theory  or  Zadeh's  theory  can,  and  in  fact  have 
in  the  past,  respond  that  they  do  Indeed  accept  the  inevitability  of 
probability.  As  Dempster  has  commented,  belief  function  theory  is 
founded  on  probability,  and  so  there  is  no  contradiction  in  using  belief 
function  theory  at  the  same  time  as  using  probability  theory.  Moreover, 
as  I  have  argued,  one  can  think  of  fuzzy  set  theory  as  being  a  heuristic 
approach  in  situations  where  a  full  probabilistic  analysis  is  far  too 
complicated  to  be  undertaken. 


It  is,  however,  also  possible  to  take  issue  with  Lindley's 
argument.  In  other  words,  it  is  possible  to  question  some  of  the 
premises  in  his  argument  and  thereby  avoid  the  full  power  of  his 
conclusions.  Firstly,  if  one  Investigates  the  development  of  subjective 
probability  theory  exemplified  by  Savage's  approach,  it  is  possible  to 
ask  whether  we  are  prepared  to  accept  the  axioms.  It  is  a  commonplace 
now  that  people  do  not  behave  as  though  they  accept  Savage's  axioms, 
reasonable  as  they  undoubtedly  are.  Of  course,  these  axioms  are 
normative  and  it  can  be  £U'gued,  as  indeed  Lindley  does  argue,  that  the 


fact  that  we  fail  to  abide  by  the  axioms  does  not  mean  that  we  should 
not  attempt  to  do  so.  Indeed  he  would  say  that  the  first  act  of  a 
rational  man  is  to  agree  to  the  axioms,  and  then  attempt  to  construct 
his  behavior  in  accordance  with  these  axioms.  If,  however,  we  are  not 
prepared  to  do  this,  then  what  happens  to  us  is  a  matter  of  practice. 

It  could  be  argued  that  if  we  are  consistent  in  our  failure  to  abide  by 
the  axioms,  then  our  opponents  can  turn  us  into  a  money  pump  or 
construct  a  Dutch  Book  of  gambles  against  us.  Of  course,  we  do  not  do 
this  in  practice.  We  Just  recognize  when  we  are  about  to  get  cornered 
in  this  way,  and  change  one  of  our  Judgments,  possibly  in  a  yet  more 
inconsistent  way  with  our  past  Judgments.  There  is,  therefore,  nothing 
mandatory  about  accepting  Savage's  axioms,  and  we  can  therefore  escape 
Lindley's  conclusions  if  we  wish  to. 

In  his  contribution  Professor  Lindley  gave  a  very  clear  account 
of  an  alternative  way  of  showing  the  inevitability  of  the  probability. 
This  was  based  on  the  notion  of  scoring  systems.  It  is  indeed  quite 
remarkable  that  no  matter  what  kind  of  scoring  system  one  adopts,  the 
numbers  that  one  employs  to  describe  uncertainty  must  (after  an 
appropriate  transformation)  satisfy  the  rules  of  probability  theory. 
Compelling  as  this  argument  is,  we  have  to  point  out  that  in  practice  no 
Great  Scorer  exists.  There  is  nobody  hovering  about  us  being  prepared 
to  dock  our  pay,  should  we  use  numbers  which  fail  to  conform  to  the 
rules  of  probability  theory  in  our  descriptions  of  uncertainty.  Thus 
while  the  argument  is  elegant  and  powerful,  there  is  nothing  inherently 
irrational  in  not  accepting  it,  because  in  practice  scoring  systems  do 
not  exist. 

Of  course  the  proof  of  the  pudding  is  in  the  eating.  If  it  can 
be  shown  that  in  the  long  run  any  person  who  fails  in  his  assessment  of 
uncertainty  to  combine  his  numbers  as  though  they  were  probabilities 
will  lose  out  Inexorably,  then  indeed  we  have  a  problem  in  refusing  to 
accept  probability  theory.  But  to  my  understanding  practical  proofs  of 
this  kind  are  not  yet  available. 

Thus  it  is  possible  to  escape  the  inevitability  of  probability; 
it  has  to  be  admitted,  however,  that  there  is  no  alternative  theory 
which  has  the  strength  of  support,  and  elegant  support  at  that,  which  is 
available  for  probability  theory. 

The  chief  drawback  with  using  probability  theory  is  the 
complexity  that  sometimes  results,  and  the  need  to  assess  an  often 
surprisingly  large  number  of  conditional  probabilities.  In  legal  work, 
for  example,  great  difficulty  can  arise;  some  interesting  work  by  Schum, 
at  Rice  University,  shows  how  problematic  probabilistic  inference  can 
get.  In  some  simple  murder  case,  with  five  pieces  of  evidence,  he 
needed  to  make  27  probability  assessments.  Professor  Lindley  suggested 
the  principle  of  Occam's  razor  should  be  applied  to  our  topic:  simplify 
where  possible  Sometimes  probabilistic  analysis  is  far  from  simple. 


Comments  on  the  contribution  of  Dp»  Splegelhalter 


Dr.  Splegelhalter 's  talk  was  a  most  interesting  account  of  the 
construction  of  an  expert  system  for  medical  diagnosis.  In  his  talk  he 
gave  us  some  important  insights  into  the  practical  problems  of 
constructing  an  expert  system,  which  was  both  computable  and  also 
useful.  This  raises  the  general  question  of  how  one  determines  whether 
a  particular  expert  system,  as  represented  on  some  computer,  is  actually 
a  good  one  or  not.  The  issues  involved  are  very  similar  to  those 
involved  in  validating  a  model.  Firstly,  one  needs  the  system  to  be 
faithful  to  some  normative  principle.  This  entire  conference  has  been 
concerned  with  the  appropriate  normative  principle  to  use  in 
representing  uncertainty  in  expert  systems,  and  in  my  view  one  should 
start  with  probability  theory,  but  be  prepared  to  adopt  other  approaches 
as  heuristics  or  as  richer  representations  of  the  issues  involved.  It 
seems  that  Splegelhalter 's  approach  has  been  similar. 

Secondly,  one  could  validate  an  expert  system  by  its  comparison 
with  expert  performance.  One  can  ask  whether  the  diagnosis  achieved  by 
Splegelhalter 's  system  was  better  or  worse  than  that  achieved  by 
competent  diagnosticians.  There  is  of  course  a  debate  over  whether  an 
expert  system  should  be  compared  in  this  way.  Is  the  goal  to  reproduce 
the  abilities  of  an  expert,  or  to  improve  on  the  abilities  of  available 
human  Judges?  If  it  is  the  former,  then  Indeed  it  is  sensible  to 
compare  performance  with  experts,  but  in  this  case  one  wonders  why  one 
should  not  use  the  experts  themselves.  This  could  be  answered  by 
observing  that  very  often  experts  are  in  short  supply.  If,  on  the  other 
hand,  our  goal  is  to  improve  on  human  inference  behavior,  then  the 
criterion  of  conformity  with  some  expert  performance  is  not  appropriate. 
A  final  measure  of  the  appropriateness  of  an  expert  system  is  user 
satisfaction.  To  what  extent  do  the  people  who  interact  with  the  expert 
system  feel  that  the  system  is  of  use  to  them?  In  Splegelhalter ’s  case 
there  are  two  kinds  of  people  involved,  namely  the  patients  and  the 
doctors.  As  Splegelhalter  observed,  it  is  very  important  that  the 
doctors  are  supportive  of  the  endeavor,  and  do  not  feel  that  their 
professional  competence  is  in  any  way  being  threatened.  It  is  perhaps 
more  important,  however,  that  the  patients  feel  that  they  are  being 
properly  attended  to.  Splegelhalter  seems  to  have  achieved  success  on 
both  fronts. 

6.  Summary  comments 

Although  the  purpose  of  the  conference  was  to  discuss  the  use  of 
the  different  theories  for  the  representation  of  uncertainty  in  expert 
systems,  the  principal  speakers,  perhaps  wisely,  devoted  their 
discussion  mainly  to  arguing  the  cases  for  the  use  of  their  different 
theories  in  general.  On  the  basis  of  the  discussions  we  had  at  this 
conference,  it  seems  to  me  that  one  can  summarize  as  follows. 

Probability  theory  has  a*  strong  intellectual  support  and  in  principle 
there  is  no  reason  why  one  should  not  be  satisfied  with  this  theory.  It 
does,  however,  provide  enormous  problems  of  complexity  and  of 
independence  Judgments  and  as  a  matter  of  practice  it  is  necessary  to 
seek  for  approximations.  Fuzzy  set  theory  can  be  viewed  as  a  heuristic 
for  handling  those  situations  where  imprecise  inputs  and  imprecise 


inferences  are  required  without  the  need  to  resort  to  the  greater 
complexity  of  probability  theory.  Belief  function  theory  can  be  thought 
of  as  a  way  of  representing  inferences  from  evidence  within  the 
probabilistic  framework. 

There  are  yet  other  alternative  approaches  to  handling  uncertain 
inferences  which  are  not  mentioned  at  the  conference,  and  notable  among 
these  is  the  non-monotonic  logic  of  Doyle.  Recently  Cohen  (Cohen  et  al 
1985)  has  suggested  a  combination  of  Doyle's  theory,  with  both  Shafer's 
and  Zadeh's,  which  he  has  referred  to  as  the  non-monotonic  probabilist. 
This  seems  an  exciting  possibility  of  approach  to  the  problem  at  the 
heart  of  this  conference. 

Reference 

Cohen,  M.  S. ,  Watson,  S.  R.,  and  Barrett,  E. ,  'Alternative  Theories  of 
Inference  in  Expert  Systems  for  Image  Analysis',  Technical  Report  85-1, 
Decision  Science  Consortium,  Falls  Church,  VA,  January  1985. 


MCUniTV  CLASSIFICATION  OF  THIS  FACE 


la.  AEFOAT  SECuniTV  CLASSIFICATION 

Unclassified 


2a.  SECUNITV  CLASSIFICATION  AUTHOAITV 


REPORT  DOCUMENTATION  PAGE 


lb.  RESTRICTIVE  MARKINGS 


2b.  OECLASSIFICATION/OOWNGRAOING  SCHEDULE 


4.  PERFORMING  ORGANIZATION  REPORT  NUMBER(S) 

GWU/IRRA/  Serial  TR-86/2 


3.  OISTRIBUTION/AVAILABILITV  OF  REPORT 

Unlimited 


6.  MONITORING  ORGANIZATION  REPORT  NUMBER(S) 


Ba.  NAME  OF  PERFORMING  ORGANIZATION 

The  George  Washington  Univ.  I  iitw 
Inst,  for  Reliability  &  Risk  Analysis 


Be.  ADDRESS  (Cily.  Sfala  and  ZIP  Coda) 

707  22nd  St.,  NW 
Washington,  DC  20052 


b.  OFFICE  SYMBOL  7a.  NAME  OF  MONITORING  ORGANIZATION 
Ilf  appUeahIt) 

,  .  Office  of  Naval  Research 


7b.  ADDRESS  (City,  SlcH  and  ZIP  Cod*) 

800  N.  Quincy  St. 
Arlington,  VA  22217-5000 


Ba.  NAME  OF  FUNOING/SPONSORING 
ORGANIZATION 

Office  of  Naval  Research 


Sc  ADDRESS  (City.  Stale  and  ZIP  Code) 


18b.  OFFICE  SYMBOL  IS.  PROCUREMENT  INSTRUMENT  IDENTIFICATION  NUMBER 
Of  •ppUemble)  I 


ONR-1111 


N00014-85-G-0162 


10.  SOURCE  OF  FUNDING  NOS. 


PROGRAM 

800  N.  Quincy  St.  element  no.  no. 

V  Arlington,  VA  22217-5000 _  61153N  14  4118 

11.  TITLE  rinetad*  Saeunty  Clotalflcalionl  The  CalculuS  Of 

Uncertainty  in  Artificial  Intelligence  and  Expert  Systems  ( Jnclass) 


PROJECT 

NO. 


61153N  14 


12.  PERSONAL  AUTHORIS) 

N.D.  Singpurwalla,  Principal  Investigator;  S.M.  Selig,  Coordinating  Editor;  See  #19 


TASK 

WORK  UNIT 

NO. 

NO. 

NR  4118-128 

4118-128- 

(NR347-128) 

01 

13a  TYPE  OF  REPORT 

Final 


IS.  SUPPLEMENTARY  NOTATION 


13b.  TIME  COVERED 
FROM  1  Dec  84  TO 


14.  DATE  OF  REPORT  (Yr.,  Mo.,  Day) 

1986/01/15 


IS.  PAGE  COUNT 

268 


Proceedings  of  Conference  held  December  28-29,  1984 


COSATI  COOES 


IS.  SUBJECT  TERMS  (Continue  on  reverge  if  neeeuory  end  identify  by  bioek  number) 

Artificial  Intelligence 
Expert  Systems,  Uncertainty 


ABSTRACT  iConlinue  on  nrvataa  if  neeeteary  and  identify  by  Nor*  number) 

This  is  a  collection  of  papers,  presentations  and  discussions  at  a  conference  on  dealing 
with  uncertainty  in  artificial  intelligence  and  expert  systems.  Three  different  approaches 
were  examined,  namely/l)  The  use  of  belief  functions, (2)  fuzzy  set  logic,  and/^)  probabilit; 
A  case  study  example  of  a  probabilistic  approach  for  expert  systems  in  medicine  was 
presented.  '  Authors  of  individual  papers  are  G.  Shafer,  L.A.  Zadeh,  D.V.  Lindley,  and 
D.J.  Spiegelhalter. 


20.  OISTRiBUTION/AVAILABILITY  of  abstract 
'  -ASSIFIEO/UNLIMITEd£1  sameasrpt.  □  otic  users  □ 


224  NAME  OF  RESPONSIBLE  INDIVIDUAL 

Edward  J.  Wegman 


21.  ABSTRACT  SECURITY  CLASSIFICATION 

Unclassified 


23b  TELEPHONE  NUMBER 
(tneiude  Aren  Code) 

[202)  696-4310 


22c.  OFFICE  SYMBOL 

ONR-1111 


DD  FORM  1473, 83  APR 

EDITION  OF  1  JAN  73  IS  OBSOLETE. 

SECURITY  CLASSIFICATION  OF  THIS  PAGE 

